GNNs — Practical difficulties and applications

Filippo Minutella
LARUS
Published in
9 min readJan 20, 2022

Introduction

This article was written by me and my colleague Valerio Piccioni.

Using Graphs and AI has proven to be very effective in many areas.

The idea behind this is very simple: we can exploit the node neighborhood to get more information about a node.

It is very important in contexts like Social Networks, Transaction Networks, Product Recommendations, Drug Discovery, and many others.

So let’s start to see how we can use AI on Graphs.

Neural Networks on Graphs

We want to avoid an in-depth discussion on the type of graphs but if you are curious you can find more information about them here.

In this section, we want to focus on how you can apply Neural Networks to Graphs, for example, if you want to classify Nodes, Subgraphs, or entire Graphs.

A Graph
A very simple Neural Network (MLP)

The MLP can take as input a vector, and a node in a graph can be presented as a vector using embedding methods like FastRP, Node2vec (always here if you want to deep dive), and so on, or you can use also the node features (i.e. the properties attached to a node: for example, if a node represents a person in a social network, the age is a property of the node).

In the last few years, we have seen how specialized neural networks, like CNN for image and RNN for text, and more in general sequences, have obtained incredible results.

The CNN (Convolutional Neural Network) and RNN (Recurrent Neural Network) are trained directly on image and sequence respectively exploiting the structure of the input, this type of architecture can be trained end to end, i.e. they can learn also how to extract relevant features from the input.

Taking an image as an example, we can classify an image as a cat or not, by writing some algorithms to extract a vector of features, and then training an MLP to perform classification of the image.

CNNs can be trained to automatically extract the relevant features for our task.

So we want to create an architecture that can automatically extract relevant features from graphs and this type of architecture is the Graph Neural Network (GNN).

Graph Neural Network (The message passing framework)

Almost all GNNs are built upon the message passing framework.

This framework allows us to iteratively update the information of a node to also contain information about his neighborhood.

It is composed of 3 steps:

  1. Message passing function allows the flow of messages between neighbors (for example a trained NN)
  2. Aggregate function aggregates messages from neighbors (sum, mean, pooling)
  3. Update function updates messages from neighbors (for example a trained NN).

The above steps are performed in that order and repeatedly to allow the expansion of the flow of information in the graph.

Message passing example

The above image illustrates how a step of message passing allows the flow of information.

Taking node 2, in the image on the right, it has received the information from its neighborhood ( i.e. the nodes to which it is connected ), the following step is to aggregate the vectors received in one vector and eventually update its state using the old state (the vectors in the left of the image) and the message received.

The above figure presents the message passing framework in a more mathematical view.

So each layer of a GNN is the application of one step of the message passing framework; this means that by staking multiple layers we will be able to propagate farther in the graph.

The first step of the message passing framework can be performed using the node features, the second step (layer) uses the updated step from the previous layer, and so on.

A layer of GCN (Graph Convolution Network) is presented above and it is one of the most famous GNN.

A layer of GAT (Graph Attention Network) is presented above and it is state-of-the-art on homogeneous graphs.

We can see how they differ in how they weigh the information coming from their neighborhood.

That’s it, we know how to apply a NN in a Homogeneous graph and soon we will see how to apply a NN in other types of graphs.

Before going further we want to discuss more on the GNN layers and message passing framework.

In graph theory, the Weisfeiler-Lehman test is a famous test to evaluate if two graphs are isomorphic or not, i.e. roughly speaking if their structure is equal.

This test performs the same step performed above in the message passing framework, it updates iteratively the color of a node using a hash function that takes as an input the neighborhood.

In this test, the message passing function is the identity function and the aggregate and update function are performed by the hash function.

In other fields of deep learning, we are used to very deep networks, but it is different for GNN. Stacking too many GNN layers makes all the node embeddings converge to the same value. This problem is called over-smoothing. This happens because in a K-layer GNN, each node has a receptive field of K-hop neighborhood and the shared neighbors quickly grow the number of hops increase (as we can see in the figure above), creating very similar embeddings.

How to implement a GNN

Many tools help us build GNNs, the most famous are:

  • Deep Graph Library (DGL) backend (TF and PyTorch)
  • PyTorch Geometric
  • TF-GNN (quite new)

We use DGL with PyTorch, for no particular reason except that DGL has many GNNs ready to use and it is very clear and simple (in our opinion).

So, let’s start and create our first GNN.

We will use Zachary’s karate club dataset as hello world in GNNs.

The karate club is a social network that includes 34 members and documents pairwise links between members who interact outside the club. The club later divides into two communities led by the instructor (node 0) and the club president (node 33). The network is visualized as follows with the color indicating the community. The task is Node Classification and the goal is to classify if a person goes with the president or with the instructor.

Above is an example of how a GCN layer can be implemented with DGL.

The forward method takes as an input a graph g and the feature, then the most interesting part is the update_all function called on the graph g.

The update_all allows the flow of messages between nodes and updates its hidden state h. This method takes as an input how to create the message (gcn_msg) and how to aggregate the messages (gcn_reduce), very clear and simple!

Now, I can stack multiple GCNLayer (here we see GraphConv, which is the layer provided by DGL) to create my GCN network.

Real-world graphs

As we can imagine real-world graphs are for sure bigger and more complex than Zachary’s karate club dataset.

A heterogeneous graph is a special kind of information network, which contains either multiple types of nodes or/and multiple types of links.

IMDb (an acronym for Internet Movie Database) is an online database of information related to films, television programs, home videos, video games, and streaming content online, the dataset used contains movies, authors, and directors.

Each movie has only one director and the main 3 authors.

The goal is to classify the movies by genre using the graph.

Graph schema of IMDb Dataset
the Pirates of the Caribbean subgraph (2003–2016)

If we don’t take heterogeneity into account, we lose a lot of information!

(if you want to deep dive into the type of graphs available you can again find it here)

Heterogeneous Graph Attention Network (HAN)

Two movies can be connected via multiple meta-paths, e.g., Movie-Actor-Movie (MAM) and Movie-Director-Movie (MDM).

A Meta-path is a sequence of relations between node types that defines a new composite relation between its starting type and ending type.

Different meta-paths always reveal different semantics.

The Heterogeneous Graph Attention Network (2019) introduced two new types of attention: nodes are projected into a unified feature space and the weight of meta path based node pairs can be learned via node-level attention.

Joint learn the weight of each meta-path and fuse the semantic-specific node embedding via semantic-level attention.

The node-level attention can be seen as a modified GAT applied to the homogeneous graph obtained using the meta path, while the semantic-level attention allows us to merge the embeddings originated by different node-level attention modules. (here the paper)

Heterogeneous Graph Transformer (HGT)

The meta-paths approach followed by the HAN framework is not the only one that can be used on heterogeneous graphs.

Another one is the meta-relation approach, for each edge connecting two nodes, its meta-relation is defined by the triplet:

<source node type, edge type, destination node type>

The design of meta paths for each node type of heterogeneous graphs requires specific domain knowledge, on the other hand, the meta-relation definition is explicit in the graph structure.

A graph neural network called Heterogeneous Graph Transformer (HGT), whose architecture is inspired by that of the classic transformer states, automatically learns and extracts meta paths directly from meta-relations. (here the paper)

Graph Structure Problems

In addition to the heterogeneity of the graph, there are other important properties to consider:

  • Too sparse graph: this may cause inefficient message passing. In this case, we may want to take into consideration to add virtual nodes/edges:
  1. A common approach is adding virtual edges to connect 2-hop neighbors.
  2. Another common approach is adding virtual nodes connected to all other nodes.
  • Too dense graphs: this may cause too costly message passing. In this case, we may want to sample a node’s neighborhood for message passing. There are many strategies to sample nodes, an approach can be to randomly sample the node’s neighborhood.
  • Too large graphs: it may not entirely fit into memory. Neighborhood sampling is a good approach for dealing with graphs that are too dense. A similar approach can be used for Graphs too large to fit into memory. Generally, a K-layer GNN generates the embedding of a node using K-hop neighborhood structure and features. To compute the embedding of a single node, all we need is the K-hop neighborhood (which defines the computation graph). Each node (or set of nodes) K-hop neighborhood can be seen as a minibatch, which we can build so that it can fit into memory.

Sampling in a heterogeneous graph is not trivial. You have to consider a minimum number of relationships and nodes for each type, in the HGT paper you can find a sample strategy for this type of graph.

Thanks for reading, if you want to ask us some questions or curiosities you can reach us on LinkedIn or comment on the article, Thanks!

References

Machine Learning with Graphs course, Jure Leskovec, Stanford, link

Deep Graph Library, link

--

--