What’s New in Deep Learning Research: Understanding Progressive Neural Networks
The intersection between artificial intelligence(AI) and human cognition is one of the most fascinating areas of research in the modern technology space. Deep learning is constantly trying to emulate mechanisms of the human brain in order to improve the capabilities of AI agents. Many of those mechanisms are centered around how humans learn and build knowledge. A recent research paper from DeepMind is proposing a method that emulates the progressive nature of human learning in deep learning model. DeepMind calls this technique progressive neural networks.
A fundamental difference between how humans and AI agents learn is that the latter almost always need to start from scratch while humans are phenomenal at leveraging prior experiences to acquire new knowledge. When confronted when learning a new subject, we rarely start from scratch. Instead we are constantly trying to reuse prior knowledge. Analogies, creativity, imagination are some of the cognitive skills that are enabled by our ability to correlate knowledge from different subject areas. Furthermore, as humans, we seem to completely incapable of completing forgetting knowledge acquired in our prior experiences. In the case of AI systems, that ability is still on very nascent states.
In the last few years, disciplines such as representation and transfer learning have been at the forefront of knowledge reusability. However, those techniques still have severe limitations when comes to learning similar tasks in the same model. In the transfer learning approach, a model is pretrained on a source domain (where data is often abundant), the output layers of the model are adapted to the target domain, and the network is finetuned via backpropagation. At the moment, transfer learning has tangible drawbacks which make it unsuitable for transferring across multiple tasks. For instance, if we wish to leverage knowledge acquired over a sequence of experiences, which model should we use to initialize subsequent models? This seems to require not only a learning method that can support transfer learning without catastrophic forgetting, but also foreknowledge of task similarity.
Entering Progressive Neural Networks
The idea of progressive neural networks is to effectively transfer knowledge across a series of tasks. Conceptually, progressive neural networks have three major goals:
a) The ability to incorporate prior knowledge at each layer of the feature hierarchy
b) The ability to reuse old computations and learn new ones
c) Immunity to catastrophic forgetting
Contrasting with transfer learning models that incorporates prior knowledge only at initialization, progressive networks retain a pool of pretrained models throughout training, and learn lateral connections from these to extract useful features for new tasks. The progressive approach to learning achieves richer compositionality and allows prior knowledge to be integrated at each layer of the feature hierarchy.
To see progressive neural networks in practice, lets take a neural network with some number L of layers trained to perform the initial task. In the DeepMind research, this neural network is known the initial column of the progressive network:
When it comes time to learn the second task, the model will add an additional column and freeze the weights in the first column (to avoid catastrophic forgetting). The outputs of layer l in the original network becomes additional inputs to layer l+1 in the new column.
If a third task is needed, the model will add a third column, and connect the outputs of layer l in all previous columns to the inputs of layer l+1 in the new column:
The innovation of progressive neural networks is not so much to have come up with a brand-new learning technique but rather to combine a series of well know method into an innovative learning model. Progressive networks provide a model architecture in which catastrophic forgetting is prevented by instantiating a new neural network (a column) for each task being solved, while transfer is enabled via lateral connections to features of previously learned columns.
The goal of continuous and reusable learning is still years away in AI systems but I feel that progressive neural networks is a step in the right direction. The DeepMind team applied progressive learning to master a series of Atari games and the results, which are illustrated in the research paper, were nothing short of remarkable.