What’s a differentiable neural computer (DNC)? Part 1.

Paper in question: “Hybrid computing using a neural network with dynamic external memory”, recently released by Google Deepmind.

Download the paper here.

The problem with neural networks is that they tend to co-mingle objects and computation. Very often, neural networks are a mixture of both the objects themselves and the logic which interacts upon those objects.

This makes them not very generalizable to new objects, and hard to use when the size of the input objects vector changes. Separating the logic from the memory is the key idea.

A DNC is a neural net coupled to a memory matrix. The behavior of the network is independent of the memory size.

Think of the neural net as the CPU, and the memory as the RAM. Except here, the CPU is a neural net that’s programmed via gradient descent.

This is where I get stuck in the paper: “Whereas conventional computers use unique addresses to access memory contents, a DNC uses differentiable attention mechanisms to define distributions over the N rows, or ‘locations’, in the N × W memory matrix M.”

What are “differentiable attention mechanisms”? Here are links to the 4 cited papers

That’s a lot of intense reading.

Here’s a good Quora answer: https://www.quora.com/What-is-exactly-the-attention-mechanism-introduced-to-RNN-recurrent-neural-network-It-would-be-nice-if-you-could-make-it-easy-to-understand

So basically, in recurrent neural networks used for image recognition, there’s a model of not examining the entire image at once, but examining a small portion of the image, and scanning relevant locations for more information. This is akin to how we humans examine images. We have a focal point, and we scan around the picture.

They are using this RNN way of scanning images for this. I.e., instead of scanning over an image, we are scanning over memory.