Evolving Agents That Don’t Forget

A single neural network is able to perform multiple reinforcement learning tasks.

This article summaries some of the research I did as part of my undergraduate degree at the University of Southampton. I developed a novel technique for reducing ‘catastrophic forgetting’ — the pitfall that machine learning systems encounter when trying to learn several task sequentially. However, instead of focusing on learning systems, the novel technique that was developed was applied to evolutionary systems that evolve a population of neural networks designed to solve reinforcement learning tasks. The original work can be found on arXiv.

The curse of forgetting

Catastrophic forgetting is the act of overwriting what has been previously learnt

The ability to learn new skills on top of existing ones is a crucial concept for the continuing improvement of AI systems. This is the essence of continual learning — a system that is able to perform one task must learn a novel task while retaining its ability in the original task.

When training a neural network on a new task, its weights are adjusted, but as all the weights were finely tuned to solve the original problem, a major adjustment to the weights in order to solve the new problem can dramatically reduce performance on the original task, i.e. the previous task has been forgotten. So learning must occur in such a way that the neural network retains knowledge and therefore performance on the original task, but at the same time learns as much as it can about the new task.

One way to do this is to ‘protect’ the weights that are most important to the original task and allow the less important weights to change; the assumption here is that the important weights contain most of the knowledge and will alleviate the symptoms of forgetting. This is the idea behind the Elastic Weight Consolidation (EWC) technique developed by Kirkpatrick et. al, in which the gradients of the neural network weights were used to calculate the importance of the weights. When learning a new task, a penalty term was added to the reward function that penalised changing the important weights.

The technique developed for my work was based on the EWC method, however it was adapted for evolutionary systems instead of learning systems. Before delving in to this novel method, some background on neuroevolution is required.

The wonders of neuroevolution

Neuroevolution has some neat tricks up its sleeve

Genetic algorithms develop solutions to problems by mimicking ideas from biological evolution; taking the ideas of natural selection, crossover and mutation to maintain a population of agents that hopefully get increasingly fit over many generations. This can be applied to neural networks in a process known as neuroevolution — the evolutionary equivalent of using back propagation to develop a network in a learning system.

One of the cool tricks that neuroevolution can do is change the network topology while developing the population of networks; this is known as neuroevolution of augmenting topologies (NEAT). This allows for architecture as well as parameter search to occur at the same time, and is key to the technique used in this work to reduce forgetting.

Neuroevolution is usually a gradient-free method, i.e. it doesn’t use the gradients of each network parameter with respect to the error function (as in normal back propagation learning), instead it uses an evolutionary process that changes the network as a whole rather than altering each weight individually. Since the gradient is not used and therefore not calculated in normal neuroevolution, it would be beneficial if a technique for reducing forgetting didn’t require the gradients (which rules out the EWC method mentioned above), as the gradient calculation is often expensive and sometimes intractable.

A new approach

Time to grow beyond forgetting

As previously said, the idea of the EWC method was to protect the weights that were most important to the original problem; this involved using the gradients of the network weights. However, since these gradients aren’t readily available in neuroevolution, how else can the importance of the weights be established?

An alternative approach is to consider all of the weights to be of equal importance, and then add new weights to the network with the specific intention of using those new weights to learn the new task while keeping the old weights the same, hopefully retaining the knowledge and performance on the previous task. This can be achieved through neuroevolution as it allows the topology of the network to evolve at the same time as the weights!

This is the main idea behind the ‘weight protection’ method developed in this work: begin with small networks that are evolved for task A, then encourage them to grow when learning task B and penalise any changes in the original network weights. This can easily be extended to learning more than two tasks, as once the network has learnt task A and B, the weights that are now being protected give high performance on both task A and B, and the new weights are solely used for improving performance on task C, and so on and so forth.

A little bit of maths

Just to spice things up a bit from chunks of text

Only joking, the maths will stay in the arXiv upload. However, at a high level, the weight protection is implemented as part of the fitness function through two additional penalty terms: one to penalise networks for changing from the ‘best’ configuration for the previous task and one that regularises the newly added weights. Both have tunable hyper-parameters to allow a trade-off between preserving performance on the previous task(s) and learning the new task.

Preliminary results

And, did it work?

The newly developed weight protection technique was pitted against plain neuroevolution (i.e. without the weight protection penalty terms). It was also tested alongside an existing solution to reducing catastrophic forgetting in neuroevolutionary systems that aims to evolve modular networks. By using multi-objective optimisation, the evolutionary strategy could support weight protection at the same time as the modulatory technique, so four combinations were tested: normal evolution, modular evolution, weight protection, and modular evolution with weight protection.

One of the ways of testing the evolutionary strategies was to monitor the drop-off in performance of an initial task while learning subsequent tasks. Using tasks from the OpenAI Gym, the following figures shows how the weight protection technique reduces the loss of performance on the first time as the later tasks are learnt.

As the figure shows, evolution with weight protection and modularity was the most effective out of the four techniques, and weight protection was successful even on its own without modularity. Overall — it works!

And in conclusion…

There’s always more than can be done

In further tests, the weight protection technique was able to find a network that had a very high performance across all four tasks — a feat that both normal evolution and modular evolution were unable to accomplish. However, it was only tested against slightly outdated benchmarking tasks; most research has moved onto more complex tasks such as beating Atari games, so further testing is required.

Still, this works showcases the potential effectiveness of weight protection, and more importantly demonstrates the transfer of a technique from a learning system to an evolutionary system (the adaptation of EWC into weight protection proved successful). This could lead to further developed of the weight protection idea in both domains, and hints that more techniques could be shared between the two.




PhD student at the Alan Turing Institute and the University of Southampton. Machine Learning and Explainable AI www.jearly.co.uk

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Let’s talk about Object Detection

Re/Evolution of understanding visual feature in Computer Vision

Fully Explained SVM Classification with Python

Machine Learning: The Top 5 Supervised Learning Algorithms That You Must Know

Why need to convert words/sentences to vectors in NLP

A Product Manager’s Guide to Machine Learning: Balanced Scorecard

Review — Linguistic Regularities in Continuous Space Word Representations

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Joseph Early

Joseph Early

PhD student at the Alan Turing Institute and the University of Southampton. Machine Learning and Explainable AI www.jearly.co.uk

More from Medium

Gentle Introduction to Cycle GANs

Good Sources to Learn CNNs, CapsNet, ViT, and GNNs — 2

The structure of Dynamic Routing

Focus not on Accuracy, but on Gains and Lifts

Unblocking Blind Spots: Should We be Vigilant of AI?