Deep Learning with Knowledge Graphs

Andrew Jefferson
Nov 13, 2018 · 8 min read

Last week I gave a talk at Connected Data London on the approach that we have developed at Octavian to use neural networks to perform tasks on knowledge graphs.

Here’s the recording of the talk, from Connected Data London:

In this post I will summarize that talk (including most of the slides) and provide links to the papers that have most significantly influenced us.

To find out more about our vision for a new approach to building the next generation of database query engine see our recent article.

What is a graph?

Image for post
Image for post
Two functionally identical graph models

We are using a property graph or attributed graph model. Nodes (vertices) and relations (edges) can have properties. In addition our Neural Network has a global state that is external to the graph. The slide shows two representations of this model one from Neo4j and the other from DeepMind (n.b. these are effectively identical).

Why are we interested in graphs?

Image for post
Image for post
All the graphs!

Graphs have a rich history, starting with Leohnard Euler in the 18th century to a whole range of graphs today. Within the field of computer science there are many applications of graphs: graph databases, knowledge graphs, semantic graphs, computation graphs, social networks, transport graphs and many more.

Graphs have played a key role in the rise of Google (their first breakthrough was using PageRank to power searches, today their Knowledge Graph has grown in importance) and Facebook. From politics to low cost international air travel, graph algorithms have had a major impact on our world.

What is Deep Learning?

Image for post
Image for post
I’m not sure about A.I. let’s talk about Deep Learning…

Deep learning is a branch of machine learning centered around training multi layer (“deep”) neural networks using gradient descent. The basic building block of these neural networks is the dense (or fully connected) network.

Image for post
Image for post
A deep neural network using dense layers

Using deep learning has allowed us to train computers to tackle a range of previously challenging tasks from playing Go to image recognition with superhuman performance.

Image for post
Image for post
MacNets and other examples of superhuman image processing neural networks

Machine Learning

In general machine learning is a simple concept. We create a model of how we think things work e.g. y = mx + c this could be:

house_price = m • number_of_bedrooms + c
Image for post
Image for post
Machine learning, view from 20,000ft

We train (fit) the parameters of our model (m and c in the example) using the data that we have. Once our training is done we have some learned parameter values and we have a model that we can use to make predictions.

Sometimes the parameters are useful by themselves (e.g. when we use a neural network to train a word embedding such as word2vec).

Image for post
Image for post

Deep Learning on Graphs

At Octavian one of the questions we asked ourselves is: how would we like machine learning on graphs to look from 20,000ft?

To help answer this question, we compared traditional forms of deep learning to the world of graph learning:

Image for post
Image for post
Comparing graph machine learning with other setups

We identified three graph-data tasks which we believe require graph-native implementations: Regression, Classification and Embedding.

Aside: there are other graph-specific tasks such as link prediction that don’t easily fit into the three tasks above.

We observed that many existing techniques for machine learning on graphs have some fundamental limitations:

  • Some do not work on unseen graphs (because they require first training a graph embedding)
  • Some require converting the graph into a table and discarding its structure (e.g. sampling from a graph using random walks)

Existing Work

Image for post
Image for post
Performance of DL models on graph problems is not superhuman

Much of the existing work using Deep Learning on graphs focuses on two areas.

  1. Making predictions about molecules (including proteins), their properties and reactions.
  2. Node classification/categorisation in large, static graphs.

Graphs, Neural Networks and Structural Priors

It’s often said that Deep Learning works well with unstructured data — images, free text, reinforcement learning etc.

But our superhuman neural networks are actually dealing with very specifically structured information and the neural network architectures are engineered to match the structure of the information they work well with.

Image for post
Image for post
Data structures that work with neural networks

Images are in a sense structured: they have a rigid 2D (or 3D) structure where pixels that are close to each-other are more relevant to each-other than pixels that are far apart. Sequences (e.g. over time) have a 1D structure where items that are adjacent are more relevance to one another than items that are far apart.

Image for post
Image for post
Dense layers make sense for Go where locations that are far apart on the board can have equal influence one another

When working with images and sequences, dense layers (e.g. where every input is connected to every output) doesn’t work well. Neural network layers that reflect and exploit the structure of the input medium achieve the best results.

For sequences Recurrent Neural Networks (RNNs) are used and for images Convolutional Neural Networks (CNNs) are used.

Image for post
Image for post
Convolutional Networks structurally encode that pixels which are close to one another are more significant that pixels which are far apart

In a convolutional neural network each pixel in the hidden layer only depends on a group of nearby pixels in the input (compare this to a dense layer where every hidden layer pixel depends on every input pixel).

Image for post
Image for post
Neither dense nor convolutional networks make sense for a transit graph

Nodes in graphs do not have fixed relations like nearby pixels in an image or adjacent items in a sequence. To make deep learning successful with graphs it’s not enough to convert graphs to matrix representation and put that input into existing Neural Network models. We have to figure out how to create Neural Network models that work well for graphs.

Image for post
Image for post
This paper makes the same argument more effectively than I do — check it out!

We aren’t the only people thinking about this. Some very clever people at DeepMind, Google Brain, MIT and University of Edinburgh lay out a similar position in their paper on Relational Inductive Biases. I recommend this paper to anyone interested in deep learning on graphs.

The paper introduces a general algorithm for propagating information through a graph and argues that by using neural networks to learn six functions to perform aggregations and transforms within the structure of the graph they can achieve state of the art performance on a selection of graph tasks.

Image for post
Image for post
One algorithm to rule them all?

By propagating information between nodes principally using the graph edges the authors argue they are maintaining the relational inductive biases present in the graph structure.

The MacGraph neural network architecture that we have been developing at Octavian has similarities to the relational inductive biases approach. It employes a global state that exists outside the graph and also propagates information between the graph nodes

Octavian’s experimental results

Before I can tell you about our results at Octavian I have to mention the task that we used to test our neural graph architecture.

Image for post
Image for post
our synthetic benchmark dataset

You can read more about CLEVR-Graph here. It’s a synthetic (procedurally generated) dataset which consists of 10,000 fictional transit networks loosely modelled on the London underground. For each randomly generated transit network graph we have a single question and correct answer.

Image for post
Image for post
Some example questions from the CLEVR-Ggraph question bank and an example graph

The crucial thing about this task is that each graph used to test the network is one the network has never seen before. Therefore it cannot memorise the answers to the questions but must learn how to extract the answer from new graphs.

At time of writing MacGraph is achieving almost-perfect results on tasks requiring 6 different skills:

Image for post
Image for post
MacGraph’s latest results on CLEVR-graph

I think that one of the most exciting skills is MacGraph’s ability to answer “How many stations are between{station} and {station}” because to solve that question it’s necessary to determine the shortest path between the stations (Dijkstra’s algorithm) which is a complex and graph-specific algorithm.

How does MacGraph work?

It’s not sufficient to just propagate information between nodes in the graph using transformation and aggregation functions. To answer natural language questions about a graph with natural language answers it’s necessary to transform the input question into a graph state that results in the correct answer being reached and it’s necessary to extract the answer information from the graph state and transform it into the desired answer.

Image for post
Image for post
…almost

Our solution for transforming between natural language and graph state is to use attention. You can read more about how this works here.

Image for post
Image for post
an alternative to dense layers

Attention cells are radically different to dense layers. Attention cells work with lists of information, extracting individual elements depending on their content or location.

These properties make attention cells great for selecting from the lists of nodes and edges that make up a graph.

In MacGraph write attention is used to input a signal to the nodes in the graph based on the query and the properties of the nodes. This signal should then prime the graph message passing to interact with the nodes most relevant to the question.

Image for post
Image for post
Prime the graph with the query using write attention

After information has propagated through the graph’s nodes, attention is used to extract the answer from the graph:

Image for post
Image for post
Read from the graph using attention

Combining write and read attention with propagation between nodes in the graph using the graph structure we are get the core of MacGraph.

Image for post
Image for post
Mac Graph architecture

Conclusion

There is a strong case that achieving superhuman results on graph-based tasks requires graph-specific neural network architectures.

We have shown with MacGraph that a neural network can learn to extract properties from nodes within a graph in response to questions and that a neural network can learn to perform graph algorithms (such as finding the shortest path) on graphs that it has never encountered before.

Image for post
Image for post

Octavian

Research into machine learning and reasoning

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store