Building powerful and equivariant graph neural networks with structural message-passing

What?

Structural message passing (SMP), a strictly more powerful message-passing network, that relies on identifiers while being a permutation equivariant and generalisable at the same time.

Why?

Message Passing Neural Networks (MPNNs) are useful, but they have limited representational power. We can do better.

How?

Source: original paper

TL;DR Use matrix-based messages and propagate them in a permutation equivariant way.

In a typical MPNN, a node has a feature vector $x_i \in \mathbb{R}^c$. In SMP, each node has a local context matrix $U_i \in \mathbb{R}^{n \times c}$. The j-th row of $U_i$ has a c-dimensional representation of node $v_i$. If we reorder the nodes, we will need to reorder rows of $U_i$.

The local context is initialised as a one-hot encoding for each of the vertices. If a node has other features, they are also appended to the matrix:

# features of node A at initialisation for a graph A->B->C

one_hot = [[1],[0],[0]]
V_i = [3.14, 42]

# concatenate
U_i = [[1,3.14,42],
	[0,0,0],
	[0,0,0]]

An update step is done as follows:

$$ U_i^{(l+1)} = u^{(l)}_i( U_i^{(l)}, \tilde{U}i^{(l)}) \in \mathbb{R}^{n\times c{l+1}} , $$

$$ \tilde{U}_i^{(l)} = \phi(\{m^{l}(U^{(l)}_i, U^{(l)}j, y{ij})\}), $$

where $u$ is a vertex updater, $m$ is the edge updater (message function) also taking edge features $y_{ij}$, and $\phi$ is an aggregator.

A cool trick here is that when we sum over a neighbourhood of node features (their local context matrices), we effectively compute powers of the adjacency matrix allowing detection of topological features. 🤯🤯🤯

The authors give sufficient conditions for SMP equivariance, defining a class of functions that can be used with SMP. Intuitively, if we look at $U_i$ as a set of nodes, any equivariant neural network for sets will do the job. As a result, SMP can also be trained on graphs of arbitrary sizes. In addition, SMP is invariant to local isomorphisms, i.e. if k-hop neighbourhoods of $v_i$ and $v_j$ are isomorphic, then for the node classification task a k-layer SMP will yield the same result for these vertices.

The authors also prove that SMP, if properly parameterised, can map non-isomorphic graphs to different representations which also leads to its universality. However, the proof is not constructive → we still have to find a way to derive a practical parametrisation that is universal. However, the authors find a way to constructively show that SMP can simulate any MPNN with the same number of layers, but not the other way round → SMP is strictly more powerful than MPNNs.

What?

Why?

How?

And?