Neural Arithmetic Logic Units (NALU) — A new beginning?

When I was very young — my scores used to be average at best. A lot of people used to score better than me, specially in exams which needed paragraphs / essay type answers.

Why?

Because, I used to understand the underlying concepts and then explain them on my own. Since English was not my first language, I would barely manage to explain what I wanted to and hence scored average at best. Let’s say this was Type A learning.

On the other hand, there were students in my class who used to learn the answers, as were provided by teachers in class without necessarily understanding them. Let’s say this was Type B learning.

Thankfully, I did not give in to the urge of scoring higher marks at the cost of understanding the subject. Because, when unseen questions came in exams — I could answer them while most of the other people could not

What does this have to do with Neural Networks?

Well, the current ways of training Neural Networks is very similar to Type Blearning mentioned above. They need huge amount of data to learn things and work only when they see same things.

They don’t generalize well outside of this training range.

Andrew Trask (from Deepmind) et al recently showed that Neural nets can not even learn Scalar Identity function outside of their training set.

They introduced two new models, the Neural accumulator (NAC) and the neural arithmetic logic unit (NALU). The early results of the two models when attached to CNNs and LSTMs show much better results, i.e. more Type A kind of learning.

What is Neural Accumulator (NAC)?

Simply put, NAC is is a linear transformation of its inputs which keep the scaling of input and output vector consistent.

NAC is a special case of a linear (affine) layer whose transformation matrix W consists just of −1’s, 0’s, and 1’s; that is, its outputs are additions or subtractions of rows in the input.

This prevents the layer from changing the scale of the representations of the numbers when mapping the input to the output, meaning that they are consistent throughout the model, no matter how many operations are chained together

What about NALU models? How do they work?

To encourage more systematic numerical extrapolation, the authors propose an architecture that represents numerical quantities as linear activations which are manipulated using primitive arithmetic operators, controlled by learned gates.

The authors further call this module a neural arithmetic logic unit (NALU), by analogy to the arithmetic logic unit in traditional processors (remember the ALU units!).

Early Results

Experiments in the paper further show that NALU-enhanced neural networks can learn to track time, perform arithmetic over images of numbers, translate numerical language into real-valued scalars, execute computer code, and count objects in images.

In contrast to conventional architectures, the authors obtain substantially better generalization both inside and outside of the range of numerical values encountered during training, often extrapolating orders of magnitude beyond trained numerical ranges.

More details can be found in the paper https://arxiv.org/pdf/1808.00508v1.pdf

My take

This looks like an interesting development and can have a huge impact. In a few years, we might look back at current ways of training Neural Networks and feel that they were very rudimentary in nature.

As for NALU, only time and more detailed tests would tell whether they truly are more Type A learning or not — only time will tell.

Hoping to see much more action on this in coming months.

Reference: https://arxiv.org/pdf/1808.00508v1.pdf