Hopfield nets and the brain

Serban Liviu
18 min readSep 10, 2019

--

In this article we will be discussing about the Hopfield networks, how they work and see how some key parts of our brains involved in learning and memory seem to look like and work exactly like them. We will first start with understanding of what Hopfield nets are and we will show some concrete examples of them in action. Then we will explore a little bit the hippocampus, a part of the brain that is involved in learning and episodic/declarative memory formation. It is widely accepted that a specific part of the Hippocampus called CA3 or Cornu Ammonis 3 works as an auto associative memory much like a Hopfiled network and it is involved in activating entire episodic memories based on incomplete parts of that memory. A process known as pattern completion.

The reason for writing this blogpost is primarily because most of the reading material about Hopfield nets is either too mathematically oriented or too dry with a few math formulas and a graph oriented figure around, but no emphasis on actually understanding what they do and why they do. I hope you enjoy

Why are Hopfield nets so interesting and intriguing?

  1. Because they are unsupervised
  2. They do not learn using backpropagation. Instead they learn using biologically inspired, and very simple Hebbian Learning. There is no indication that our brains are learning using complex backpropagation (a.i. computing the partial derivatives of weight of a cost function)
  3. They memorize patterns from just one sample. No need for a training set as we see in traditional machine learning. Very much what the brain does. Learning from one shot experiences
  4. We see them in our brains, in the Cornu Amonis region 3(CA3) of the Hippocampus where they do participate in learning and in episodic memory formation/consolidation
  5. They work as an auto associator and an error correction. Given a corrupted pattern or a few “parts” of a pattern, the hopfield net will be able to reconstruct it. Much like an autoencoder, with the difference that in the case of an autoencoder we do use backpropagation and a training set with hundreds of samples

…..but don’t hold your breath yet. You might ask ok that’s all fine. Then why isn’t this model the paradigm in deep learning? Why do we use feed forward, convolutional and LSTM networks instead? First because they do not work as classifiers as a feed forwards net will. And they cannot do regression or anomaly detection either. They are associative networks. But the main reason why they have fell of grace has to do with the actual capacity of a Hopfield net. Capacity is the main problem with these type of nets. A Hopfield net could store at most 0.15N relevant patterns. So if you have a net with 100 neurons, you could store at most 14 to 15 patterns. If you attempt to store more than that, the network will fail to properly memorize those patterns and instead we could end with the network memorizing bogus patterns which could be a combination of other previously stored patterns. Having said that, there is a a lot of active research into how to increase the capacity of Hopfield nets and still maintaining the one shot learning capability of it.

In other words the Hopfield nets along with Self Organizing Maps are believed to hold the secret of how our brains are learning and consolidating memories. Even though not that much is known about the hippocampus, or the cortex for that matter, there is a lot of research that is converging on the idea that these old and simple models could explain more about the brain then any fancy, industry approved model such as convolutional neural net could ever do.

What is a hopfield net and how do they work?

Here we will show all the technical details about how these nets look like and how they work & how they are trained. But it’s ok if you get lost in the technical details of it. They will all start to make sense when we will go through a practical example of learning actual patterns. So let’s begin.

A Hopfield net is a set of neurons that are:

  • Bidirectionally connected between each other with symmetric weights, i.e. the weights between all neurons ii and jj are wij=wjiwij=wji.
  • Not self-connected, this means that wii=0wii=0.

Both properties are illustrated in Fig 1, where a Hopfield network consisting of 5 neurons is shown.

Fig. 1. Hopfield network architecture.

One property that the diagram fails to capture it is the recurrency of the network. The Hopfield networks are recurrent because the inputs of each neuron are the outputs of the others, i.e. it posses feedback loops as seen in Fig. 2. This last property is better understood by the recalling process.

Fig. 2. A Hopfield network consisting of 5 neurons with feedback loops.

How do they learn?

Hopfield nets learn by using the very simple Hebbian rule. The hebbian rule means that the value of a weight wij between two neurons, ai and aj is the product of the values of those neurons.

Simply put:

The reason behind that is that any two cells or systems of cells that are repeatedly active at the same time will tend to become ‘associated’ so that activity in one facilitates activity in the other.

So if ai is say 10 and aj is also 10, then the weight between them would also be 100. Also if both ai and aj had the value of -10, the weight would still be 100. Meaning that neurons that have the same value(or the same sign) would have a large positive weight. They “attract” each other. On the contrary, if say ai was 10 and aj was -10 then the weight between them would be -100. The neurons would repel each other in this case.

The main Hebbian postulate sounds like this:

Neurons that fire together wire together, neurons out of sync, fail to link.

The Hebbian learning is biologically inspired, unlike say backpropagation. It seems intuitive that nature came up with some simple rules for learning rather than the complicated partial derivative stuff that we see in backpropagation that do not seem to make any biological sense. It was shown that biological neurons that have the same action potential will create more and more synapses between themselves, therefore strengthening the connection between them(a.i. the weight is stronger), whereas neurons that have oposite action potentials, do not have these strong links, and whatever synapses were between them, they will eventually degrade over time. A very nice video exactly about this:

So that’s how learning takes place in a hopfield net.

How feedforward works in hopfield nets?

Just like in any other neural net, we have learning, and feedforward. Once the network has learned something, we want to give that net a similar pattern and see the output. In the case of feedforward/LSTM nets used for classification, the output might be a discrete value. In the case of a hopfield net, the output is a reconstructed pattern. So in order to do feedforward in a hopfield net, we will initialize the neurons with the values of a pattern and and then we will go through each neuron and compute the weighted sum of its inputs. If that sum has the same sign as the value of the neuron, we do nothing, otherwise we will flip the neuron. This is called to let the network “evolve” towards a learned pattern.

We will start from the assumption that the values of each neuron in the net is either +1 or -1.

The weighted sum for any neuron ai would look like this :

That’s like in any other neural net. We take the values of all neurons aj that connect to ai (in the case of a hopfield net that would be all the other neurons in the net since it is a complete graph) and we multiply each one by the weight wij between itself and ai. We sum up all these values.

But in a hopfield net feed forwards procedure we are not interested in the actual value of that sum but in the sign on that sum.

Then the value of that neuron will be :

So lets say that if the value of the neuron is +1. If the weighted sum was positive then value of the neuron doesn’t change. Otherwise if the sum was negative, then the value of the neuron would change from +1 to -1. So in other words, the value of the neuron will change to match the sing of its weighted sum. We will do that for all the neurons in the net until we end un in a “stable” state. That is when the sign of any neuron will match the sign of it weighted sum. In other words, the networks will stop evolving.

The “energy” function of a Hopfield net

Like any neural net, the Hopfield net also has some “cost” function associated with it. The problem is that this function is not really a cost function and unlike traditional cost functions, it is not a function of the weights of the network but rather a function of the states of the network (the values of the neurons). If you are familiar with feedforward neural nets and with cost functions such as mean squared error and/or cross entropy, this will be very confusing for you. Not to mention the that term itself, energy is pretty intimidating.

In general a cost function for a neural net measures the error between the actual output of the net given a sample and the desired output from the training set. And the idea is to minimize this function using gradient descent. But in the case of a Hopfield network, we do not have any labeled training set. We just “look” at some patterns and memorize them. So we do not really have the notion of “error” since there is no “teacher” telling us the actual label of something. Think of the way we usually memorize experiences in our brain. We just experience them and this is how we learn them and then remember them. That’s it.

The energy of a Hopfield net is the sum of each neuron times its weighted sum. We divide by 2 because the weights are symetric. Every Flip of the neuron that we’ve seen in above will minimize the total energy of the net. So we can thing of the surface of the energy function that is a function of the states of the network. If a net has N neurons, then the energy function will be a n dimensional function.

We see in the image above, that the every learned pattern is a bowl in the surface of the energy function. And when we do feedforward starting with an incomplete pattern(the red dot in the image) we will flip the neurons until we will get to the valley. until the learned pattern is reconstructed. The reason why the learned patterns are the minimum of the energy function is because as we’ve said earlier, once we get to a stable state, where no neuron will flip.

Give me a practical example. This is too boring and complicated.

So we will start with a practical example of a Hopfield net that will learn some letters. Our network will have 25 neurons but for the sake of understanding, the neurons will be arranged in a 5X5 grid.

In general this is how a 20 neurons net will could look like:

but obviously that looks pretty scary though. So it is better to present the network as a grid of neurons like in the image below:

Each square is a neuron.

The blue neurons have the value +1

and the white neurons have the value -1

As you can see we are not showing the weights between the neurons because that will complicate the picture too much. But keep in mind that there is a symmetric weight between each pair of neurons just like in the previous image above.

Training the network:

We will start with a zero weight matrix. The weight matrix will 25X25. We will intialize the values of the neurons with the pattern from above. Again, the blue neurons have the value +1 and the white one have the value -1. Now that network neurons have been initialized, we want the net to learn this particular pattern. In order to do this, we will apply the hebbian learning rule that we’ve discussed before. So every weight between two “blue” neurons will be +1 and every weight between two “white” neurons will also be +1 (-1 * -1 = +1). All the other weights between a “blue” neuron and a “white” one will be -1 (+1 * -1 = -1) :P

Take a look at the next three pictures below:

….and that’s it. We will end up with matrix like this :

Now what? What’s the point of doing some stupid +/- 1 multiplications ? What is this good for?

In order to see why this is useful, once we’ve learned a pattern we have to use an incomplete pattern and do feedforward on each neuron and see what happens. Remember that by feedforward on a neuron we mean computing the weighted sum of all the other input neurons. If the sign of the sum is the same as the value of the neuron, then nothing happens. Otherwise the sign of the neuron will change to match the sign of the weighted sum.

Let’s suppose that after learning the weights for the T pattern, we initialize the network with these values:

That looks more or less like the pattern that we’ve learned before. But there is blue noise neuron in the down left in the image and the T pattern is missing 3 blue neurons. Now let’s do a feed forward on the red circled neuron in the image below using the learned weights:

The circled neuron now has the value -1

Just to recap, the feedforward formula was this :

Given the learned weights from the previous pattern, using hebbian learning, where this neuron had the value +1, the weight between this neuron and any “white” neuron is -1 and any weight between this neuron and a “blue” neuron is +1. If we do the sum, we will see that for every “white” neuron multiplied by the -1 weight value to this neuron we will get a +1. The same thing for the blue neurons, since their value is 1 and the weight between them and this neuron is also 1. The only exceptions will be the other two missing neurons from the T pattern since now they have a value of -1 and for the noisy blue neuron in the down left corner. Since it value is +1 and the weight between it and the red circled neuron is -1, the resulting product would be -1. So if we compute this we will see that the actual weighted sum is 19. That’s positive and the value of our circled neuron is -1. So we have to change its value to match the sign on the weighted sum. And we will get a picture like this :

The same thing will happen to the other two missing blue neurons from the original T pattern. They will flip to match the weighted sum and this way the original T pattern will be reconstructed. For the rest of the neurons both white and blue, their weighted sum will match their sign so nothing will happen. They are stable. But we still have one more to go. And I’m talking about that noisy blue neuron that was not part of the original pattern:

In the original pattern its value was -1. The learned weight between it and every other white neuron is +1 and the weight between it and every other blue neuron is -1. So a minus neuron multiplied by a +1 weight will yield a negative value, and a +1 blue neuron multiplied by a negative weight will also yield a negative value. Doing the math we will get a very negative sum. But the current value is +1. So we will flip this neuron as well and the result will be a negative value and the final reconstructed pattern will be identical to the one we’ve learned.

So what we have done was to reconstruct the original learned pattern from a corrupted and noisy one. Or in other case, we have associated this bogus pattern to the original one that we’ve memorized.

Learning more than just one pattern

Let’s suppose that now we want to learn another pattern. Say an H letter:

So we will initialize the neurons in the net with these values from the image above (again, blue = +1 white = -1) and we will apply hebbian learning just as we did in the previous example. But we cannot apply it like we did when we’ve learned the first pattern where we just multiplied the neuron values. If we do this here we will learn this new H pattern but we will end up erasing the old values for the weights that are memorizing the first T pattern. So what we’ll do, is that we will use hebbian learning but we will sum it up with the previous values of the weights.

In general if we want to store K patterns in a Hopfield network, the hebbian learning rule will be :

where superscript k does not indicate exponentiation but rather the values of those neurons in the kth pattern.

Now let’s use the following pattern as input :

and let the network reconstruct it.

But before we proceed lets look at this image of the so called weight map. Let say we want to compute the weighted sum of the red circled neuron in the image below:

This how the weights are now after learning the second pattern:

  1. All the weights between the red circled neuron and the neurons in the yellow box are 2
  2. All the weights between the red circled neuron and the neurons in the green box are 0
  3. All the weights between the red circled neuron and the neurons in the brown box are -2

In order to understand this just imagine that if you overlap the first pattern, the T and look where the red circled neuron was in relation with that pattern and with this current pattern. Then you sum up the weights.

Computing the weighted sum for that red circled neuron will yield us a positive sum, and since its value is negative we have to flip it. Doing the same for the other two “missing” neurons from the H pattern will eventually lead us to the original pattern

Spurious minima

So as we’ve just seen, a simple Hopfield network can learn several patterns in a very simple one shot learning. Obviously that looks very promising, until we realize that for this network of 25 neurons we can learn at most 3 patterns. Maybe 4 if we are lucky and by that I mean if the patterns are “far enough” from one another. So that does not seem to be very efficient. The other problem that we will encounter if we try to learn more patterns is the so called spurious minima patterns. These will be a pattern that was memorized by the net although wee did not want that pattern to be memorized. And they usually look like a combination of some of the patterns that we wanted to learn. In mathematical terms, they are just local minima in the surface of the energy functions and if we initialize the network with some values, then the network could evolve through that local minima and get trapped in it.

That “deep” minima in the image above is one of the learned patterns whereas those small pits are spurious minima. But I do not feel like we can cover this here. Maybe in a future post. But if you feel like exploring more about Hopfield nets, check this CMU course here which is by far the best and with the most in depth explanations:

Part 1: https://www.youtube.com/watch?v=yl8znINLXdg

Part2: https://www.youtube.com/watch?v=LtGdn9h5OSQ

What does have to do with our brains after all?

In our brains we have this structure called the Hippocampus which is part of the Limbic System. Although much about the the hippocampus is still unknown it is more or less well understood that the hippocampus plays a central role in spatial memory, navigation, episodic memory formation. It is also believed that it plays a major role in learning facts for instance.

First let’s establish what episodic memory is and what it is not.

In the brain we have several types of memory. You could classify them broadly into implicit memories and explicit. The implicit memories are usually the the things that you do without thinking. Such as walking, eating or skills that you’ve learned over a very long period of time such as driving or swimming. Much of the implicit/procedural memory is stored in the cerebellum. You might be confused about why walking or eating is considered a “memory”. What do I mean by memory here is that the brain has to store in itself somewhere and has to remember how you walk properly without falling etc.

We also have explicit memories. These are the memories that you are aware of them when you try to recall them. Things like facts such as the capital of Germany is Berlin. These would be semantic memories. And another type of explicit memory would be the episodic memories. These are the memories that we have about events in the past. Such as the party from your graduation, or events from your childhood such as that moment when your dad caught you smoking. But they also include things like what you ate yesterday, who you met a week ago or what you just did just 5 minutes ago. etc.

The Hippocampus seems to be the seat of the episodic memories. But the hippocampus does not store any memories itself. It acts as an index for all episodic memories. Episodic memories are composed of one or more sensory information such us visual information, auditory information, olfactory information, gustatory information and tactilee information. All these sensory information travels to the brain via the nervous system and it is processed and stored in the cortex.

Also all these cortical areas will send their highly processed and compressed signals about the sensory information to the hippocampus via the entorhinal cortex which acts as a gateway to the hippocammpus. In the Hippocampus, more specifically in the CA3 region, these sensory information from the cortical areas are associated one with the other. that is done with a Hopfiled like neural net inside the CA3.

So lets say that yesterday while you were waiting the bus in the bus stop, it was raining pretty bad and suddenly you’ve heard a terrible noise. And it was a car accident. You will definitely remember that experience. All the sensory information about this experience, the very loud noise of the car crash, the rain, and all the other visual information such the location where this has happened, the bus stop itself, the building around etc will be stored in the neocortex in different sensory areas, but they will be associated in the hippocampus. Next time you pass on that street, you will most likely remember that experience, that while you were waiting the bus and it was raining hard outside, a car crash happened. What does that mean is that one visual sensory cue(the location) has activated a stored pattern(or index) in the hippocampus Hopfiled like area called CA3 which in turn will be used to activate all the other neocortical regions(auditory etc) that have participated in forming that memory. Schematically the image below shows the whole flow of memory formation and then memory recall from just an incomplete pattern of cortical activities:

Also check this short youtube clip for an a great introduction into the hippocampus.

So as a final word I want to show a diagram of the whole cortical hippocampus loop and exactly we’ve focused on. The the area that we’ve covered in this post, the CA3 area which acts as an auto associator Hopfield like network is just this tiny square in the down left corner of the image below:

As you can see this is quite complicated. This is the hippocampus-cortical loop. But in turn the hippocampus is a giant recurrent neural net itself. And within that the CA3 is a highly recurrent loop.

Bibliography:

What is the Hippocampus? Part-I Episodic memory and index theory

A DEEP LEARNING-INSPIRED MODEL OF THE HIPPOCAMPUS AS STORAGE DEVICE OF THE BRAIN EXTENDED DATASET

3D Brain

A quantitative theory of the functions of the hippocampal CA3 network in memory

The Hippocampal Indexing Theory and Episodic Memory: Updating the Index

Hopfield networks chapter from Raul Rojas’s Book

--

--