Deep Learning to “Invent Language”

Carlos E. Perez
Intuition Machine
Published in
4 min readMar 18, 2017

--

Credit: https://unsplash.com/collections/450707/invent?photo=2R5QtSqJtdY

OpenAI research has a short introduction on their newest research “Learning to Communicate”. There are many trends that I watch for in the field of Deep Learning. Two trends that are related and I believe going to be very promising areas are language learning and multi-agent communication. If you have not been watching, this week has had a tremendous release of papers involving the former and culminating with OpenAI’s post, stitching it all together! Let me explain though what transpired in this amazing week.

Denny Britz (he says he’s a high school dropout working for Google), released some a new general-purpose encoder-decoder framework for TensorFlow. This is all described in more detail in a paper: Massive Exploration of Neural Machine Translation Architectures. To quickly summarize, a team at Google cranked through 250,000 GPU hours ( at $.70 per hour at GoogleCloud rates, that will setback a poor researcher $175,000, life is never fair!) training different English-German translation networks to come up with some important insight as well as some nice hyper-parameters. This is one nice gift from Google to the Deep Learning community.

Not to be out done, another group at Google introduced something that goes even beyond the classic encoder-decoder design. They introduced something they christened as DRAGNN. I must say, I was initially put off by the click-bait like title, but this is one impressive piece of work that lives up to its name! As you read the paper, you realize quickly that this is a very different architecture. A sense of fear overcame me, worrying that to replicate this kind of work would take an enormous amount of effort. Fortunately, a day later, I was alerted to a post from Google’s Research Blog with a very simple title: “An Upgrade to SyntaxNet, New Models and a Parsing Competition”. SyntaxNet more popularly known as “Parsey McParseface”, introduced something new called “ParseySaurus”. Read up on ParseySaurus and you realize the “large beast” connection.

Let’s explore why I think DRAGNN is important. You see, I am exploring this idea of “Modular Deep Learning”. That is, the concept that you can have a modularized version of Deep Learning and that you can stitch them together to build solutions. I explored this a bit more in a previous article “Learning to Coordinate”, where I had a rough survey of what perhaps may be needed to make further progress. I came to this conclusion that something that is an intermediary between Deep Learning modules may be necessary, but I really didn’t know of anything in the field that approximated this. Well, DRAGNN’s TBRU (Transition Based Recurrent Unit) seems to somewhat fit the bill! Granted that its designed for NLP translation, however as described by the authors:

In this work, we propose a modular neural architecture that generalizes the encoder/decoder concept to include explicit structure. Our framework can represent sequence-to-sequence learning as well as models with explicit structure like bi-directional tagging models and compositional, tree-structured models. Our core idea is to define any given architecture as a series of modular units, where connections between modules are unfolded dynamically as a function of the intermediate activations produced by the network.

So not only does this research have a way to stitch together networks, it allows more expressive intermediate language representations (i.e. compositional tree-structured models) to be used. This is really just great stuff, right at the nick of time. One general perspective of what a Deep Learning system can do is that it can perform universal translation. Conventionally, DL are thought of as universal classifiers, however we can think more generally if of them as universal translators (i.e. Babelfish). From this perspective, a lot of more clever applications become more evident. I’ve come to the opinion that Deep Learning from the language driven perspective is the most fruitful way of thinking about it. (Remind me to make changes to 5 Levels of Capability).

To end this week, we’ve been gifted by OpenAI with their research on “Learning to Communicate” (they are supposed to be “Open” so hope to see source code soon!). They developed an RL system on the constraint that languages that are useful are both grounded and compositional. Grounded meaning that the words in the language have meaning. Compositional in that the words can be strung together to create more specific instructions. The system they setup is a cooperative multi-agent system ( DeepMind did some work on competitive systems). The key technical mechanism that the researchers came up was a “differentiable communication channel” that used the Gumbel-Softmax to treat the communication as consisting of categorical variables (one nice trick to remember).

In other multi-agent models that we looked at, what was learned was the behavior of each agent, however the communication mechanisms remained opaque and uninterpretable. In OpenAI’s research, they are sort of constraining the communication channel in a way that is more language like rather some continuous stream of data. So it’s really getting very interesting that, not only are frameworks being developed that are beginning to better understand sequences of tokens, but we are exploring ways to learn how to invent language.

What a week in Deep Learning! As dessert, three papers came to my attention regarding “Deep Meta-learning”. “Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks”, “Learning Gradient Descent: Better Generalization and Longer Horizons” and “Learned Optimizers that Scale and Generalize”. Finally, one cool simple trick to improve gradient descent called Hyper-gradients!

Strategy for Disruptive Artificial Intelligence: https://gumroad.com/products/WRbUs

--

--