Notes on: Schlichtkrull, M., Kipf, T. N., Bloem, P., Berg, R. v. d., Titov, I., & Welling, M. (2017): Modeling Relational Data With Graph Convolutional Networks

Table of Contents

Notation

  • schlichtkrull17_model_relat_data_with_graph_convol_networ_892982d5999047c0afd7bb10305003bf4a4dde7d.png denotes nodes (entites)
  • schlichtkrull17_model_relat_data_with_graph_convol_networ_79f94a293a931295ded780d48207afceeceaa970.png denotes labeled edges (relations)
  • schlichtkrull17_model_relat_data_with_graph_convol_networ_13d7f405d2443e3de02f49d837a1831f861449d6.png denotes a relational type
  • schlichtkrull17_model_relat_data_with_graph_convol_networ_caee4c6494825946c339fcc08d9a06f129dc5743.png denotes a directed and labeled multi-graph
  • a multi-graph means a graph which has multiple "layers" for each vertex (you can view them as multiple "identical" graphs stacked on top of each other)
  • schlichtkrull17_model_relat_data_with_graph_convol_networ_c5a14d87b6a0f40bbccc479b0ba0315eadf6c3ed.png is the hidden state of node schlichtkrull17_model_relat_data_with_graph_convol_networ_5d4395016d30850d955a0eb3b9c99ae498007d3d.png in the l-th layer of the neural network
  • schlichtkrull17_model_relat_data_with_graph_convol_networ_0591540d648daba6d5ad0a272d4fccbc93c7d9dc.png is the dimensionality of the l-th layer
  • schlichtkrull17_model_relat_data_with_graph_convol_networ_b78940fd50fae37efe36c34ea87873b6433e72a7.png represents a "message" between two different vertices, taking as input the hidden representations of these two vertices in the l-th layer, e.g. schlichtkrull17_model_relat_data_with_graph_convol_networ_f2b96883887ac38a2501eb6585079c02fc9ccfe2.png
  • schlichtkrull17_model_relat_data_with_graph_convol_networ_e2489de64c78dedf12366bccedcde9e352fd0f98.png denotes the set of incoming messages for node schlichtkrull17_model_relat_data_with_graph_convol_networ_5d4395016d30850d955a0eb3b9c99ae498007d3d.png and is often chosen to be identical to the set of incoming edges
  • element-wise activation function schlichtkrull17_model_relat_data_with_graph_convol_networ_0caf7f021b652c27be5d0abcb90c1ed62647c3df.png
  • schlichtkrull17_model_relat_data_with_graph_convol_networ_d5338cb285b8bd16c22d5778dd8baca2084a1548.png denotes the set of neighbor indicies of node schlichtkrull17_model_relat_data_with_graph_convol_networ_97eb714dfbd8abb06c6ee1fb2cb049cdaa7defd1.png under relation schlichtkrull17_model_relat_data_with_graph_convol_networ_13d7f405d2443e3de02f49d837a1831f861449d6.png
  • schlichtkrull17_model_relat_data_with_graph_convol_networ_d6af9a4adcbbf4f88d98b9beb76e8169cc8dbfa3.png is a problem specific normalization constant that can either be leraned or chosen in advance, e.g. schlichtkrull17_model_relat_data_with_graph_convol_networ_c268b8c177e4f74899c39392a0aec0f857e80a18.png

Neural relational modeling

Relational graph convolutional networks

After transforming the propagation rule into matrix-multiplication, I realize that for a single relation-type, schlichtkrull17_model_relat_data_with_graph_convol_networ_e5fa936311045b1c4e98ab67ffdd8f85bdb12588.png, then it's exactly the same as the kipf16_semi_super_class_with_graph_convol_networ :)

  • Normal / regular GCNs can be understood as a simple differentiable message-passing framework

    schlichtkrull17_model_relat_data_with_graph_convol_networ_bf685deb9a9f5cb726f1c6f26fd890badc3477b2.png

    where the messages of the form schlichtkrull17_model_relat_data_with_graph_convol_networ_6a5ed3fbc8bea096330aca1946b5ee1f097940c8.png are accumulated and passed through an activation function schlichtkrull17_model_relat_data_with_graph_convol_networ_0caf7f021b652c27be5d0abcb90c1ed62647c3df.png

What follows refer to the proposed relational graph convolutional networks (R-GCNs):

  • Propagation model for calculating the forward-pass update of an entity or node denoted by schlichtkrull17_model_relat_data_with_graph_convol_networ_5d4395016d30850d955a0eb3b9c99ae498007d3d.png in a relational (directed and labeled) multi-graph:

    schlichtkrull17_model_relat_data_with_graph_convol_networ_7d35ff4278701508ab83f269ed3062c063850a87.png

  • Intuitively:
    • Accumulates transformed feature vectors of neighboring nodes through a normalized sum
  • Different from regular GCNs, they introduce relation-specific transformations, i.e. depending on the type and direction of an edge
  • To allow representation of a node at layer schlichtkrull17_model_relat_data_with_graph_convol_networ_0fc728e835b61daa077100f24d2da3e03533b13a.png to be informed by the corresponding representation at layer schlichtkrull17_model_relat_data_with_graph_convol_networ_51d63cb7c481df34149becc8c883f9f9a2d2136c.png, they add a "special" relation type to each node in the data, which connects schlichtkrull17_model_relat_data_with_graph_convol_networ_d93558e97bed3bb3dbf777b6c0b1a340c0070413.png to schlichtkrull17_model_relat_data_with_graph_convol_networ_cb409f7b2bf20bdb328c2e5b20face3dfab0c89a.png, i.e. the representation of schlichtkrull17_model_relat_data_with_graph_convol_networ_5d4395016d30850d955a0eb3b9c99ae498007d3d.png in the two layers.
  • Can be implemented efficiently using sparse matrix multiplication

schlichtkrull17_model_relat_data_with_graph_convol_networ_7d35ff4278701508ab83f269ed3062c063850a87.png

Regularization

Basis-decomposition

Define schlichtkrull17_model_relat_data_with_graph_convol_networ_960face694427942aad698f1a8738339076ca66e.png as follows:

schlichtkrull17_model_relat_data_with_graph_convol_networ_94032d64b5bc3d71bd3facca4d86d7e0eb543e45.png

i.e. as a linear combination of basis transformations schlichtkrull17_model_relat_data_with_graph_convol_networ_a79f563a919e9224df5c29e8238591195300cfc0.png with coefficients schlichtkrull17_model_relat_data_with_graph_convol_networ_731c28d1892b52eec5a7958e9d8c8c53c70b4ab9.png such that only coefficients depnd on schlichtkrull17_model_relat_data_with_graph_convol_networ_b4bc85fbe6b1085924f1ba268e01893b52ddad89.png.

Can be seen as effective weight-sharing between different relation types.

Block-diagonal decompositon

Define schlichtkrull17_model_relat_data_with_graph_convol_networ_960face694427942aad698f1a8738339076ca66e.png as a direct sum over low-dimensional matrices:

schlichtkrull17_model_relat_data_with_graph_convol_networ_480dadfa73d04d5ed0bab2c4a1bb81cd9c2b2863.png

Thus, schlichtkrull17_model_relat_data_with_graph_convol_networ_960face694427942aad698f1a8738339076ca66e.png are block-diagonal matrices:

schlichtkrull17_model_relat_data_with_graph_convol_networ_79d715ea7d9babec020344b9eef9ac6f406e23b6.png

with schlichtkrull17_model_relat_data_with_graph_convol_networ_b74226ca15f08620f1faf35cb9840424bd5bdb27.png.

Can be seen as sparsity constraint on the weight matrices for each relation type.

Also, this decomposition encodes an intuition that latent features can be grouped into a set of variables which are more tightly coupled within groups than across groups. (why?)

schlichtkrull17_model_relat_data_with_graph_convol_networ_f59f1d8063989633870e419e14ca26a060e0853a.png

Model

  1. Stack schlichtkrull17_model_relat_data_with_graph_convol_networ_9dfb630910b0b3d2a61cb5f1673b9adaa08fe150.png layers as defined in the propagation rule
  2. Output of the previous layer is input to the next layer

Ideas

Apply R-GCN to Netflix recommendations

  • Can think of it as trying to figure out which movies you "should have seen already" but haven't, i.e. link prediction

Implementation

Layer

  • [X] The propagation rule as matrix multiplication over sparse matrix
    • Adjacency (sparse) matrix to sum over neighbors (one for each relation schlichtkrull17_model_relat_data_with_graph_convol_networ_b4bc85fbe6b1085924f1ba268e01893b52ddad89.png): schlichtkrull17_model_relat_data_with_graph_convol_networ_04c42393ba540874e3d62507efe170349d7447f7.png
    • Relation (sparse) matrix to sum over relations: schlichtkrull17_model_relat_data_with_graph_convol_networ_873dc4d57802aec74f9b0d351e1145b987b72619.png

Then

schlichtkrull17_model_relat_data_with_graph_convol_networ_c9e600edcdd5b79d35a93cb4bf21b9382917a9c6.png

For each relation we can then write:

schlichtkrull17_model_relat_data_with_graph_convol_networ_fd1a6171c64ba5a1db571131f7573da535ec991b.png

where:

  • schlichtkrull17_model_relat_data_with_graph_convol_networ_80c60be334e16d3e9460ea62049929be85c41906.png is a one-hot vector representing the neighbors (under relation schlichtkrull17_model_relat_data_with_graph_convol_networ_b4bc85fbe6b1085924f1ba268e01893b52ddad89.png), i.e. adjacency matrix
  • schlichtkrull17_model_relat_data_with_graph_convol_networ_f1330523c697c7f312e3dcdd0386cfbf0532a867.png is a matrix with the hidden vectors schlichtkrull17_model_relat_data_with_graph_convol_networ_dd5f6d1b0d4c10c780ce2458830ca847ae491d92.png as the columns

Thus,

schlichtkrull17_model_relat_data_with_graph_convol_networ_7edab73e73f268d3234d84c78ce703856983ad4c.png

which ends up being schlichtkrull17_model_relat_data_with_graph_convol_networ_2ebb664385de04470ae49d8f66dcb685093633e0.png.

Thuuus we get the above equation for each

Using the basis-decomposition, we rewrite the weight-matrices:

schlichtkrull17_model_relat_data_with_graph_convol_networ_39a396f12b5aa202b496b7e38a598165e31414c5.png