Neural graph memory

David Mack
Published in
5 min readJun 22, 2018


We’re excited to preview a new architecture for connecting neural networks to knowledge graphs. Neural Graph Memory (NGM) is a fully differentiable architecture for general purpose graph data retrieval.

The importance of knowledge graphs

Graph structure, that of entities and relationships between them, is a natural representation for information in our world. From our transit networks and physical spaces, through to our social interactions and linguistic associations, every part of the human experience is a densely connected, almost infinite graph of information.

Graph structure is required to fully capture that information and to traverse it. Furthermore, graph structure enables the continual aggregation of new information into the existing knowledge base.

Knowledge graphs as a memory for intelligent agents

Therefore, we propose that a graph is best structure for intelligent agents to store, process and recall information from the world around them. We propose that memory recall may often involve multi-step graph traversal.

There is biological plausibility to this model. Neuron structures’ most striking characteristic is their very wide fan-out and the inter-connection with distant synapses. Human memory recall often occurs as a series of conscious associative steps (‘I left my keys.. in my wallet.. which was under the bed.. in the hotel’) and performing complex association chains, or recalling facts for which only scant details can be remembered, can take a very long time.

The familiar phenomenon of remembering a fact days after an attempted recall (‘What was that artists name?’), we suggest could also be related to a long-running graph traversal.

The problem posed by graph data

Whilst graph structure is a very powerful way to represent data (in mathematics it can be used as a basis for most other structures) its powerfulness is also a problem. It can be hard to process graph data since its structure is so flexible and expansive. For this reason, many of the worlds most popular databases are table-based. It’s much easier to write an algorithm that iterates over a list of items than it is to process a graph.

The bulk of the progress and deployment of machine learning has been on tabular/fixed sized data (e.g. images in convolutional networks, text and tabular features in dense/recurring/reinforcement networks) most likely for similar reasons. There has been a lot of research and progress in models that learn from graph data, however the field is newer and less developed.

Neural Graph Memory enables neural networks to access knowledge graphs and traverse their data. We hope that this and other future innovations will make knowledge graphs a common and straightforward component of machine learning systems.

The Neural Graph Memory architecture

Here we outline the NGM architecture. We’re currently implementing a series of tests and benchmarks of the architecture, which we plan to publish later in the year. We’re also employing it in a Graph-Question-Answer system as a solution to the CLEVR Graph dataset.

NGM is an attention based module, using content addressing for the selection of data. It fuses together edges and nodes into (Node, Edge, Node) triples. This triple structure enables retrieval based on known properties of either nodes or edges or both. The triple structure also provides a simple way for neural networks to process both nodes and edges simultaneously without having to build special pathways for each.

As NGM allows both selection and recall of both node and edge data, it is sufficiently general to implement any graph traversal algorithm given multiple iteration steps. When combined with a graph-write module, it allows for the implementation of any graph algorithm.


NGM takes a two read inputs: content c and mask m from the parent network and outputs a data signal d. The data signal is a linear combination of the matched memory content. Whilst this is useful both in a one-shot feed-forward network and in a recurrent network, we expect it is most powerful in a recurrent network. Used in a recurrent network (e.g. a Differentiable Neural Computer, or MACnet), graph traversal and aggregation are possible by combining the data signal into subsequent read signals.


We consider a directed graph G to be a set of vertices v ∈ V and edges (v,v) E. Each vertex and edge has an associated dictionary of properties pᵥ(v) and pₑ(v,v).

We define a graph table as the expansion of every edge:

The graph table forms the basis of the data exposed to the neural network.

There are two inputs to the NGM module:

  • Read content c — The data to compare memory contents to
  • Read mask m — A mask of which parts of the memory contents to compare to

The NGM performs attention across the graph table elements t∈ t using a comparison function f(c,m,t) that returns a comparison score between the content c and each element t of the graph table.

Many comparison functions are possible, here we use the general form with masking (where W is a trainable parameter, o-dot is element-wise multiplication and we assume our vectors are row-shaped):

The full read operation is then performed in the standard attentional way by applying softmax to the comparison with the entire graph table, then using that to perform a weighted sum of the graph table:

Further work

We’re actively working on testing and benchmarking this approach, and using it to build models for graph based reasoning. We’ve a couple of key areas to develop further:

  • Writing to graph memory
  • Selecting and paging parts of a large graph into memory
  • Performing multiple parallel read operations across a sharded graph

As we make progress with this we’ll release further code and articles.

Octavian is an open-source organization and we welcome collaboration with outside researchers.


Thanks to Ashwath Kumar Salimath for helping draft and proof this article.



David Mack

@SketchDeck co-founder, researcher, I enjoy exploring and creating.