Spektral: Streamlining Graph Convolution Networks
relationships are complicated
The connections between data can often tell us more than the data itself.
Nothing in this world exists in a vacuum — everything is a part of something else, every piece of information is interlinked with other data. Ignore context at your own risk.
But since graphs are everywhere — and there’s no shortage of ways to record them as simple data structures — how do we go about analyzing these graphs?
Well, you could start by feeding them into a neural network, experimenting until something goes horribly wrong, and then trying again with more graphs. This is called “data science”.
Graph Neural Libraries
Now let’s take a look at Spektral, a Python deep learning library built off of TensorFlow & Keras. Considering the power behind it, installation is relatively simple; however we’ll also want the RDKit library for some extra niche capabilities.
You could muck about with Anaconda environments, but the easiest way (if you’re on Mac) is to install RDKit with Homebrew. Open a new terminal window:
brew tap rdkit/rdkit
brew install rdkit
Then we can just call PyPI for the graph libraries:
pip install spektral
And we’re pretty much good to go.
Spektral’s tutorial example is a citation network, composed of peer-reviewed papers published in various scientific journals. The graph’s nodes are papers, and undirected edges are drawn to represent citations. Kieran Healy put together a rather large one for philosophy articles:
Looking at a finished example is a good clue as to why we’d want to do this in the first place. Perhaps the greatest single strength of graphs lies in community detection.
Visually it immediately clues us in to which papers are more influential: Whose works were quite seminal? Which studies formed “subcommunities” that are likely to center around highly-related topics? Are there papers that seem to reach across the spectrum, cited in many different areas?
The finishing touch (and end result of many graph networks) is coloring each paper — the aforementioned community detection — with labels, or classes. This lets us effectively generate new information about our data from existing data points (edges/citations), which is a hugely powerful ability.
The foundation of academia, after all, is standing on the shoulders of all who came before you. These color-class visualizations allow us to glean a deeper understanding of who “anchored” whom; whose ideas are influenced by or derivative (this isn’t necessarily bad!) of whose.
Let’s get building.
Graph Convolution in Ten Minutes
The code’s brilliantly simple, thanks to Spektral’s commitment to designing around the Keras API, which holds your hand through most of the complex stuff.
We start by importing Spektral’s
GraphConvolutional layer as well as
Model, Input and
Then we load example data from the pre-built
A is our sparse adjacency matrix, X is our graph, and y is the labels.
Our task will be to predict the labels of nodes that the graph hasn’t yet seen, so we generate some Boolean mask arrays to divvy up the data.
train_mask is just an array of
[True, False, …] where True indicates “use this node for training the model”.
Next we build the model:
The hardest part of building most deep learning models often involves preprocessing and fitting the data to avoid shape mismatches, and we have special
Input shapes of
(F, ) and
(N, ) here as well.
We’re feeding both the graph and its adjacency matrix into the model, since we need to include both the node and connection information.
Apart from that it’s pretty standard Keras-like architecture. We’ve got two
GraphConv layers with a
Dropout in between to regularize.
Before we compile, we’ll take care of final preprocessing:
Spektral makes this ludicrously easy with a
GraphConv.preprocess(matrix) method. Here we’re scaling the weights of each node’s connections (paper citations) based on its degree, or the number of connections to that node.
The tutorial also clarifies the reasoning behind using weighted_metrics (semi-supervised learning, boolean masks from before).
Some final data preparation converts our graph
toarray() and recasts the adjacency matrix as the proper type.
Training is a little more interesting than usual, because we’re doing away with randomized sub-batching entirely.
Normally if you’re feeding 10,000 images to a convolutional image net, you’d train the model on X at a time before starting a new epoch, updating weights, shuffling the data and grabbing X more images.
But splitting up our graph into sub-graphs could randomly chop off important citation connections, and distort valuable information, so we set
Similarly, shuffling our matrices would disjoint our adjacency matrix and randomly re-connect papers to other articles they didn’t actually cite, also invalidating our dataset.
We throw in the boolean mask arrays from before to alter node weights during training and validation phases. This essentially takes care of
train_test_split() in a graph-friendly way, so we can just feed the model the same graph, adjacency and label matrices throughout.
Evaluation is also quite streamlined.
For context, the number of classes is 7, so I’d say 70% accuracy is quite respectable for an 8-second training time with 2 layers.
You are now convoluted
That’s it. The irregular, hard-to-capture “universal structure”, the network graph, has been reduced and fed into a Keras layer for your convenience.
Of course, this is easy with pre-built data. The hardest part is usually finding good graph data and fitting it properly into the architecture.
A more immediately fascinating example I’ll unpack next time is Danielle Grattarola’s Regression of Molecular Properties, where nodes are heavy atoms and atoms are chemical bonds.