The Computational Complexity of Graph Neural Networks explained

Franziska Lippoldt
5 min readSep 10, 2020

--

Unlike conventional convolutional neural networks, the cost of graph convolutions is “unstable” — as the choice of graph representation and edges corresponds to the graph convolutions complexity — an explanation why.

Dense vs Sparse Graph Representation: expensive vs cheap?

Graph data for a GNN input can be represented in two ways:

A) sparse: As a list of nodes and a list of edge indices

B) dense: As a list of nodes and an adjacency matrix

For any graph G with N vertices of length F and M edges, the sparse version will operate on the nodes of size N*F and a list of edge indices of size 2*M.The dense representation in contrast will require an adjacency matrix of size N*N.

While in general the sparse representation is much less expensive in usage inside a graph neural network for forward and backward pass, edge modification operations require searching for the right edge-node pairs and possibly adjustment of the overall size of the list, which leads to variations on the RAM usage of the network. In other words, sparse representations minimize memory usage on graps with fixed edges.

While being expensive to use, the dense representation has the following advantages: Edge weights are naturally included in the adjacency matrix, edge modification can be done in a smooth manner and integrated into the network, finding edges and changing edge values does not change the size of the matrix. Those properties are crucial for Graph Neural Networks that rely on in network edge modifications.

Taking GNN applications into perspective, the decision between sparse and dense representations can be formulated into two questions:

  1. Is there an advantage in computational complexity, i.e. What is the relation between the number of nodes and the number of edges per node or is 2*M much smaller than N*N
  2. Does the application require differentiable edge modifications

While the first question is straight forward to answer after inspection of the graphs, the second question depends on the structure of the neural network as well as the complexity of the graph. For now, it appears that large graphs (not talking about Toy examples here), benefit from pooling and normalisation/stabilization layers between layers of graph convolutions.

Edges and complexity of convolution

While some graphs arise naturally from data by well defined relations between nodes, for example an interaction graph of transactions between accounts in the last month, some more complex problem statements, especially for dense representations, do not naturally have a clear edge assignment for each vertex.

If the edge assignment is not given through the data, different edge variations will lead to different behaviour and memory costs of the graph neural network.

For this, it makes sense to start with an array as a graph representation, this could be an image for example. For an array of size MxN, the smallest amount of edges that connects every node with every other node through several hops is through connecting the current node with previous and the next node. In this case there is M*N nodes and 2*M*N edges. A simple one hop convolution performs 2*M*N operations. However, this operation would be “slow” — in order for the node information in the middle of the array to reach the first node of the array, this would require around 0.5*M*N convolutions.

A typical approach chosen for graph convolutions on images is to take the 8 direct neighbours for edge connection into accounts, in this case there is 8*M*N edges, hence each simple graph convolution has the cost of 8*M*N. For information from the center node of the array to reach the first node, being able to walk diagonally, this takes max(M,N) convolutions.

For so called self attention based approaches, every node would be connected to every other. While this requires (M*N)*(M*N) edges, during one convolutional operation the information of any node to any other node can be transmitted.

From all of those three examples above, it becomes clear that the number of edges determines the complexity of the convolution. This is unlike conventional convolutional layers, where filter sizes often come in 3x3 format and are determined by the network design, not the image input.

Additional Info: Dense vs Sparse Convolutions

The choice of dense or sparse representation not only affects the memory usage, but also the calculation method. Dense and sparse graph tensors require graph convolutions that operate on dense or sparse inputs (or alternatively as seen in some implementations convert between sparse and dense inside the network layer). Sparse graph tensors would operate on sparse convolutions that use sparse operations. From a very naive point of view it would be logical to assume that dense computations would be more expensive but faster than sparse, because sparse graphs would require processing of operations in the shape of a list. However, libraries for sparse tensor operations are available for both PyTorch and Tensorflow that simplify and speed up sparse operations.

Additional Info: Towards other edge criteria

From grid to k-nearest neighbours: If the graph’s nodes are not arranged in a grid as in the example images, a generalization of this approach is to find the k-nearest neighbors for each node. This approach can be further generalized by using feature point input instead of positions as node coordinates, with the “distance” function in the k-nn set up serving as a similarity measurement between features.

From Euclidean distance to other evaluation methods: While some graph network applications benefit from physical coordinates or pixel positions (such as traffic prediction or image analysis), other graphs and edge relations might arise from similarity criteria based on connections or knowledge, such as social networks or transaction graphs.

Related resources:

  1. Sparse Tensor for TF https://www.tensorflow.org/api_docs/python/tf/sparse/SparseTensor
  2. PyTorch sparse https://pytorch.org/docs/stable/sparse.html
  3. All graph sketches have been plotted with NetworkX: https://networkx.github.io

This article expresses the author’s own opinion and is not necessarily in accordance with the opinion of the company the author is working for.

--

--