# Notes on: Schlichtkrull, M., Kipf, T. N., Bloem, P., Berg, R. v. d., Titov, I., & Welling, M. (2017): Modeling Relational Data With Graph Convolutional Networks

## Table of Contents

## Notation

- denotes
**nodes (entites)** - denotes
**labeled edges (relations)** - denotes a
**relational type** - denotes a directed and
**labeled multi-graph** - a
**multi-graph**means a graph which has multiple "layers" for each vertex (you can view them as multiple "identical" graphs stacked on top of each other) - is the
**hidden state of node**in the l-th layer of the neural network - is the
**dimensionality of the l-th layer** - represents a "message" between two different vertices, taking as input the hidden representations of these two vertices in the l-th layer, e.g.
- denotes the set of incoming messages for node and is often chosen to be identical to the set of incoming edges
- element-wise
**activation function** - denotes the set of neighbor indicies of node under relation
- is a problem specific normalization constant that can either be leraned or chosen in advance, e.g.

## Neural relational modeling

### Relational graph convolutional networks

After transforming the propagation rule into matrix-multiplication, I realize that for a single relation-type, , then it's exactly the same as the kipf16_semi_super_class_with_graph_convol_networ :)

Normal / regular

**GCNs**can be understood as a simple differentiable message-passing frameworkwhere the messages of the form are accumulated and passed through an activation function

What follows refer to the proposed **relational graph convolutional networks (R-GCNs)**:

Propagation model for calculating the forward-pass update of an entity or node denoted by in a relational (directed and labeled) multi-graph:

**Intuitively:**- Accumulates transformed feature vectors of neighboring nodes through a normalized sum

- Different from
**regular GCNs**, they introduce*relation-specific*transformations, i.e. depending on the type and direction of an edge - To allow representation of a node at layer to be informed by the corresponding representation at layer , they add a "special" relation type to each node in the data, which connects to , i.e. the representation of in the two layers.
- Can be implemented efficiently using sparse matrix multiplication

### Regularization

#### Basis-decomposition

Define as follows:

i.e. as a linear combination of basis transformations with coefficients such that only coefficients depnd on .

Can be seen as effective weight-sharing between different relation types.

#### Block-diagonal decompositon

Define as a direct sum over low-dimensional matrices:

Thus, are *block-diagonal* matrices:

with .

Can be seen as sparsity constraint on the weight matrices for each relation type.

Also, this decomposition encodes an intuition that latent features can be grouped into a set of variables which are more tightly coupled *within* groups than *across* groups. (why?)

### Model

- Stack layers as defined in the propagation rule
- Output of the previous layer is input to the next layer

## Ideas

### Apply R-GCN to Netflix recommendations

- Can think of it as trying to figure out which movies you "should have seen already" but haven't, i.e.
*link prediction*

## Implementation

### Layer

`[X]`

The propagation rule as matrix multiplication over sparse matrix- Adjacency (sparse) matrix to sum over neighbors (one for each relation ):
- Relation (sparse) matrix to sum over relations:

Then

For each relation we can then write:

where:

- is a one-hot vector representing the neighbors (under relation ), i.e.
*adjacency matrix* - is a matrix with the hidden vectors as the columns

Thus,

which ends up being .

Thuuus we get the above equation for each

Using the basis-decomposition, we rewrite the weight-matrices: