Catastrophic Importance of Catastrophic Forgetting in Neural Networks

Albert Ierusalem
6 min readAug 30, 2018

This article is the philosophy and discussion part the paper of the same name

This paper is the first part of INFORNO


There are a number of forms of forgetting in the human brain, and this is a normal, adaptive, and necessary process for learning. One of the more interesting is the active form that helps us concentrate on some special task by moving all information that is unnecessary at that moment to unconscious memory. Hermann Ebbinghaus, a German psychologist, became the first person to study memory experimentally. In his early experiments, he encountered the problem of proving that something has been completely forgotten because, allegedly, forgotten information can still affect behaviour and is quite often remembered later. As a result, a definition was proposed that forgetting is the inability to extract from memory at a particular given moment something that was readily extracted from memory earlier.

Later experiments on memory were conducted on the mollusc Lymnaea stagnalis due to the fact that their nerve cells are rather large, with many of those nerves identified and their functions described. There is one nerve cell without which L. stagnalis cannot learn a new skill. If this cell is destroyed, the molluscs not only lose the ability to learn, they also do not forget previously learned behaviours. The model in this study proposes to translate this function into artificial neural networks and define the architecture with active forgetting mechanisms, which gives the name of the model, active forgetting machine (AFM). The AFM contains special neural networks that allow temporary forgetting of unnecessary information by disabling unwanted neurons, and then other combinations of neurons are activated to learn and solve some task.

Active forgetting machine

In the classical interpretation of artificial neural networks, all neurons in the hidden layer are initially activated, and in order to concentrate on a specific task, it is necessary to turn some of them off; in other words, it is necessary to ‘forget’ all unnecessary information. In the context of artificial neural networks, activation means that the neurons are involved in forward propagation during evaluation and backward propagation during training.

While multitasking ability allows the proposed model to switch between sev- eral problems, it is also useful during the solving of a single problem. Almost any task can be hierarchically divided into sub-tasks, and the depth of such partition increases with the complexity of the basic task. Achieving a goal in such multilevel environments is a problem. When some mechanisms of active forgetting are introduced, the model can simplify goal achieving by breaking tasks into simpler steps and training a separate combination of neurons for each sub- task. This trick naturally increases the ability of the model to select the correct action.

AFM model with variational E algorithm

To describe a general class of active forgetting mechanisms, purposed Active Forgetting Machine model is composed of forgetting net V, associative controller C, and forgetting algorithm E, which used to find the best minimal combination of necessary neurons F. C is trained to activate the correct neurons F. Forgetting net V uses forgetting layers that are capable of applying multiplied layers of neurons on a binary mask both forward and backward.

Associative controller C is a neural network with an output layer size having the same number of neurons as V forgetting layers, where C is trained to emit mask Ft, defined by algorithm E whenever it receives any sample of task M.

AFM model with evolutionary E algorithm

For example, if the problem to be solved involves two coordinately different tasks A and B, the sequence of actions that will lead to each goal are also different. To solve the two problems, the model should clearly define which groups of neurons, Fa, are responsible for the performance of actions in task A. Having determined this, the model trains a completely different group of neurons, Fb, to achieve the goal in task B. Once the groups of neurons are defined, depending on the situation, the model can switch between the strategies, activating different groups of neurons.


Artificial intelligence systems are not yet able to cope with tasks with a deep hierarchy where people demonstrate quite acceptable results. This hypothesis demonstrates an alternative view of solving this problem. It is proposed that humans are able to solve deep hierarchy tasks using systems of active forgetting. If active forgetting systems are introduced into artificial intelligence systems, this hypothesis asserts the following:

• Forgetting as universal hierarchical architecture.

• The paradox of planning uselessness.

Hypothesis 1: The processing of the environment hierarchy by the agent occurs due to the hierarchy of active neural forgetting processes. The group of neurons Fh is allocated to the task, and a subgroup of these neurons, Fhh ,is allocated to the sub-task, where h is a hierarchy level of tasks in the environment

To solve the problem of tasks with a hierarchy, the existing AI architectures divide the tasks into levels. With temporal abstraction, models are divided into two or more hierarchical stages, but it is impossible to clearly define a finite number of hierarchy levels. It is also not possible to ultimately determine exactly to which hierarchical level a particular task belongs.

With the AFM, operations will occur in the space of neurons. As was described, neurons are determined for each task, but while the model is learning for a new task, neurons are selected from the number of neurons that are used to solve a higher-level problem, This allows transfer learning of a new task more quickly because the neurons have been trained for a more general task in the same context.

Hypothesis 2: If, based only on experience, the system is able to select a correct action in each situation, planning becomes unnecessary.

In the classical concept, planning is the process of creating upcoming activities that allow for choosing the most correct actions. If the depth of the environment hierarchical partition is minimal, there is no problem perfectly planning which actions must be achieved to reach each goal. Since the hierarchical partition can be any depth, the sequence of predicted actions can be very long and expensive.

MONTEZUMA’S REVENGE for the Atari 2600 is a suitable example for demonstrating the hypotheses. Using natural language guided reinforcement learning, this study presented implicit divisions of tasks with natural language instructions. If each set of instructions is presented as an independent task, it is possible to move from the space of natural language to the space of neurons. Thereafter, when moving from sub-task to sub-task, certain neurons will be activated, which will lead the agent to the most beneficial activity in each of the sub-tasks. If the architecture is trained to perform each task qualitatively, it is clear that there is no need for planning within each sub-task. Based only on experience, the agent has the ability to perform actions, heading for the next goal. There is also no need to plan the sequence of sub-tasks that need to be performed because if the system achieves the goal in one sub-task, it means that it has passed correctly to the next sub-task, where the correctness of the sub-task has relevance to the main goal.

After all, it cannot be said that planning is not necessary at all. Reinforcement learning planning exists as an explicit activity that should be considered as one of the sub-tasks. In some cases, as in cases with other tasks, the agent fulfils the planning activity. Planning mechanisms are used to set goals, and may be used to qualitatively determine the group of neurons that need to be activated to solve the problem. Combining overcoming forgetting with Visual reinforcement Learning with Imagined Goals can help models to set on some target and more correctly activate necessary group of neurons. This approach saves resources compared to planning, because all information about the necessary actions is stored in neurons.