Glossary of Deep Learning: Dynamic Memory Networks

Published in

Deeper Learning

3 min readMay 15, 2017

The component parts of a Dynamic Memory Network (from Kumar et al, which explains the meaning of the weights signified by arrows)

A Dynamic Memory Network (DMN) is a neural network architecture optimised for question-answering (QA) problems. Given a training set of input sequences (knowledge) and questions, it can form episodic memories, and use them to generate relevant answers.

Whilst classic Encoder-Decoder (Seq2Seq) models can solve QA problems, their performance is limited by the small size of their ‘memory’ — this is what’s encoded by their hidden states and weights, and reflects the information that’s passed between encoder and decoder. This limitation becomes especially apparent when dealing with very long sequences of data, as we might find in sources like books or videos, where the salient facts might occur a long time apart in very different contexts.

This limitation can be solved by storing multiple hidden states, and then using a strategy called an Attention Mechanism to choose between them. This allows the network to refer back to the input sequence, instead of forcing it to encode all information into one fixed-length vector like Seq2Seq does. This post on Attention and Memory in Deep Learning has a good explanation of why this is so helpful.

DMNs refine the attention mechanism, so that questions trigger an iterative attention process which allows the model to condition its attention on the inputs and the result of previous iterations.

The reference papers for DMNs are Xiong et al 2016 and Kumar et al 2015, they describe an architecture with the following components:

The Semantic Memory Module (analogous to a knowledge base) consists of pre-trained GloVe vectors that are used to create sequences of word embeddings from input sentences. These vectors will act as inputs to the model.

The Input Module processes the input vectors associated with a question into a set of vectors termed facts. This module is implemented using a GRU, or Gated Recurrent Unit, similar to an LSTM but simpler, and so more computationally efficient. The GRU enables the network to learn if the sentence currently under consideration is relevant or nothing to do with the answer.

The Question Module processes the question word by word, and outputs a vector using the same GRU as the input module, and the same weights. Both facts and questions are encoded as embeddings.

The Episodic Memory Module receives the fact and question vectors extracted from the input and encoded as embeddings. This uses a process inspired by the brain’s hippocampus, which can retrieve temporal states that are triggered by some response, like sights or sounds.

On silicon, episodic memory is composed of two nested GRUs.

The inner GRU (the lower line in diagram above) generates what are called episodes. The outer GRU (upper line in diagram) generates the final memory vector by working over a sequence of these episodes.

Episode generation involves the inner GRU passing over the facts from the input module. When updating its inner state, it takes into account the output of an attention function on the current fact — which gives a score between 0 and 1 to each fact. This allows the GRU to ignore facts with low scores.

After each full pass on all the facts, the inner GRU outputs an episode which is then fed to the outer GRU, whose state has been initialised by the question vector.

The reason the DMN uses multiple episodes is so the model can learn what part of a sentence it should pay attention to, and so realise after one pass that something else was important. Multiple passes allow it to gather increasingly relevant information.

Finally, the Answer Module generates an appropriate response. By the final pass, the episodic memory should contain all the information required to answer the question. This module uses another GRU, trained with the cross-entropy error classification of the correct sequence, which can then be converted back to natural language.

See also:

A talk by Stephen Merity (one of the creators of DMNs)
YerevaNN’s Theano DMN implementation
Domluna’s Tensorflow End-To-End Memory Network
Ethan Caballero’s DMN+ in Theano/Keras

Glossary of Deep Learning: Dynamic Memory Networks

SEE GLOSSARY INDEX

Written by Jaron Collis