DeepMind: Elastic Weight Conditioning or how to fix catastrophic forgetting

Lock those weights.

Once a neural network has been trained on a task is it impossible to train it on on a second task without erasing the first task. This is what we call catastrophic forgetting.

Thankfully, DeepMind figured out a way to fix it.

The trick is to lock the weights that are used to solve the first task when training for a new task. Here is the animation of DeepMind’s website. DeepMind was heavily inspired by the synapse consolidation happening in our brain. In the brain, the plasticity (modification ability) of synapses that are essential to previous tasks is reduced as we learn.

From https://deepmind.com/blog/enabling-continual-learning-in-neural-networks/

By locking those weights, neural networks are able to learn a new task without forgetting the previous one(s).

From https://deepmind.com/blog/enabling-continual-learning-in-neural-networks/ The scale is normalized based on human performance being 1.

Results

Elastic Weight Conditioning (EWC) works! As you can see on the graph, EWC performs significantly better than normal learning algorithm (the no penalty curve).


How it works

I lied when I said that the algorithm locks weight. It technically doesn’t. The following image (from the paper) is a good way to think about EWC:

From http://www.pnas.org/content/early/2017/03/13/1611835114.full

EWC tried to converge to a point where BOTH task A and task B have a low error. In order to achieve that EWC uses a different loss function.

lambda is the importance of task A relative to task B.

From those who want to dig deeper here is the full paper: http://www.pnas.org/content/early/2017/03/13/1611835114.full