Spiking Neural Networks

Visualization of spiking neurons (from https://spectrum.ieee.org/four-types-of-brain-cells-identified-based-on-electrical-spiking-activity)

Artificial Neural Networks (ANNs) have had a tremendous amount of success in the last 10 years in domains such as image classification (Krizhevsky et al.) and language modeling (Brown, Mann, Ryder and Subbiah). However, there are still many challenges for ANNs, namely data efficiency, computational efficiency, and generalization (Chollet; Thompson et al.).

How can we improve neural networks to account for these deficits? We can look to the brain! We know that the brain generalizes incredibly well, and that it is incredibly data and power efficient (Lake et al.). So, by studying how the brain propagates information and learns, perhaps we can improve how ANNs learn.

In the next two sections, we will address the deficits of the biologically implausible aspects of ANNs. Then, we will use a new learning rule and computational model of a neuron as the building block of a new neural network architecture: the Spiking Neural Network (SNN).

Section 1: A More Realistic Neuron

Current ANNs were originally inspired by the biological brain, but research shows that they are biologically implausible (Illing et al.). One reason is that each neuron in an ANN operates on a vector of scalar inputs. Information is passed through the magnitude of the inputs. Additionally, this computation is synchronous; a neuron cannot pass information until it receives each input.

On the other hand, biological neurons communicate in spikes. When a neuron’s voltage passes some threshold, it fires an action potential which passes a signal to neighboring neurons. Importantly, each spike has (roughly) the same magnitude and the magnitude does not seem to encode any information (Brette). The information is encoded by the rate and pattern of the spikes. In contrast to ANN neurons, this computation is completely asynchronous.

Let’s build a computational model of a neuron that communicates in spikes and encodes data temporally. To accurately model how a biological neuron fires action potentials, we have to model the voltage of the neuron because it largely determines the activity of the neuron; neurotransmitters, voltage-gated ion channels, and action potentials depend on and influence the cell’s voltage (Hodgkin and Huxley).

In 1952, A.L. Hodgkin and A. F. Huxley developed an equivalent circuit to model a single neuron. The circuit was equivalent to a neuron in that the voltage of the circuit responded to injected current in the same way a single neuron would (Hodgkin and Huxley)!

They created this model using the setup pictured below: they isolated a single squid neuron and injected current, then they measured how the voltage of the cell changed in response.

Finally, they set to work to create a circuit that had the same characteristics of the squid neuron when current is injected.

The Hodgkin Huxley Model

There are many different components to consider when building a circuit that matches the voltage of a neuron. We have to model how charge flows across a neuron’s cell wall, how it flows through ion channels, and how these ion channels respond to changes in voltage.

First, let’s build an equivalent circuit model for an over-simplified neuron where there are no ion-channels. In this view of the neuron, the cell wall acts as a capacitor. A capacitor is a circuit component where two conductors are separated by an insulator. As current is injected into a capacitor, charge builds up on one side as the current cannot cross the insulator.

In this view, the neuron is like a capacitor because the fluid inside and outside the cell act like conductors because the ions can move freely in solution, while the cell wall (which doesn’t allow water to pass through) acts like an insulator.

In this oversimplified view of the neuron, we can build the following equivalent circuit:

So, how does this circuit respond to injected current?

As you can see to the left, the cell voltage increases while the current is injected. Even with this very simple neuron model, we can model neuronal memory because even after the injected current stops, the cell voltage stays high.

However, the linear interaction between injected current and cell voltage is unrealistic. In reality, if current is continually injected, the voltage will eventually saturate. This is because there are ion channels on the cell membrane that allow ions to pass through the cell membrane.

We can represent these ion channels with a new circuit component: an alternative pathway. On this alternative pathway, we need to model two main qualities: the resistance of the ion channel and the equilibrium potential of the ion channel. The resistance models how easy it is for charges to flow through the channel. The equilibrium potential is the voltage at which diffusion forces and electrical forces exactly balance. In humans, the equilibrium potential will be around -60mV (Chen and Lui).

In our equivalent circuit, we can model the resistance of the channel with a resistor, and we can model the equilibrium potential with a battery. We can now build an equivalent circuit that models static ion channels.

With this new equivalent circuit, the cell now has basic saturation — it does not have the unrealistic linear relationship between cell potential and injected current.

Although the cell response now has the basic shape of action potentials, it is not complete. In reality, ion channels are voltage-dependent, time-dependent, and ion-selective (Hodgkin and Huxley). Currently, our model of the neuron only models the average of all the ion channels that are open. So, we need to represent these dynamic ion channels.

In the Hodgkin-Huxley model, they choose to represent voltage-dependent ion channels for sodium (Na+) and potassium (K+), two ions that are important in generating action potentials. We can represent these ion channels in a very similar way to how we modeled static ion channels: alternative pathways with a resistor and battery in parallel. A key difference is that the voltage of the battery will now match the equilibrium potential of that specific ion, and the resistance of the resistor is dynamic (represented by the arrow).

This is the entire Hodgkin-Huxley model! Using this model of a neuron, we can mimic some important properties that neurons have. Below, you can see how the voltage change of an action potential can be modeled by changing the conductance, the inverse of the resistance, of the sodium and potassium ion channels.

Leaky Integrate and Fire (LIF)

How do we use the Hodgkin and Huxley model to create spikes in a computationally efficient way? We could change the sodium and potassium conductances to change the neuron voltage as shown above, but this relationship between exact ion channel conductance and cell voltage is unnecessary if we just want to create spikes.

Instead, we can abstract away the voltage-dependent ion channels with a “spike generator.” Then, we can set a voltage threshold where, when the voltage of the cell exceeds it, we mark this as a spike and reset the voltage. This is called the Leaky Integrate and Fire (LIF) model (Brunel and van Rossum).

We now have a much better model of a neuron than in ANNs. Our model maintains the state of the cell voltage, and uses this voltage to generate spikes. In the next section, we will address another biological implausibility of ANNs: the learning rule.

Section 2: A More Realistic Learning Rule

The learning algorithm that is the base of all neural networks, backpropagation, requires neurons to have “symmetric weights”: neuronal activations when propagating information forward must match the neuronal activations when propagating error backwards to update the neuron. This symmetry is unobserved in the brain, and there is not an observed brain mechanism that gives rise to this phenomenon (Whittington and Bogacz).

The exact learning rule for neurons is not known. However, there are experimentally plausible ideas. One proposal is called Spike-Timing Dependent Plasticity (STDP). In STDP, the learning updates are local and depend upon the relative firing time of two neurons.

If the presynaptic neuron (the neuron that fires an action potential) is immediately followed by the postsynaptic neuron firing an action potential, then the connection between those neurons is strengthened. Similarly, if the postsynaptic neuron fires before the postsynaptic neuron, then the connection is weakened. This simple learning rule avoids the weight symmetry issue of backpropagation (Tavanaei et al.).

We now have the basic ingredients for more realistic neurons; the neuron communicates in spikes and has a learning rule that is biologically plausible. We will use this model of the neuron to build a new neural architecture called Spiking Neural Networks.

Section 3: Spiking Neural Networks

Spiking Neural Networks (SNN) are different from traditional Artificial Neural Networks (ANN) because trains of discrete spikes, instead of scalar values, are passed through the network. Inputs (e.g. images, videos) are encoded into spikes, and spikes are passed layer by layer through the SNN. The last output layer is a fully-connected readout layer, where spikes are decoded into a result (e.g. classification).

To some, SNNs are considered the next generation of neural networks because neurons communicate layer by layer using spikes which can, ideally, encode information in both the rate of fire (rate coding) and the firing pattern (temporal coding). Additionally, the biologically plausible neuron model (LIF) and learning rule (STDP) explained in earlier sections replace the biologically implausible neuron model and learning rule (backpropagation) of ANNs.

Communication using spikes is interesting for reasons other than biological plausibility; spike trains naturally allow for encoding of temporal information (via the firing rate of a neuron). This representation of data is naturally suitable for streaming input, which is much more applicable to real-world scenarios (video, sensory input, audio, text, etc.) than static images or text. SNNs are also efficient; the asynchronous computation allows for power and time savings. When implemented on special, more brain-like hardware called neuromorphic hardware, SNNs can be very power efficient.

Among many models and experiments, there are two main techniques that perform well: training an ANN using backpropagation and then converting it to a SNN, and training a SNN using the STDP algorithm. We’ll take a look at both.

Converting ANN to SNN

Researchers have developed methods to get the benefit of both backpropagation and SNNs by fully training an ANN, creating a SNN with the same architecture, and then normalizing the resulting weights and activations of the ANN for use in the SNN. For example, to convert a Convolutional Neural Network (CNN), the same model architecture is used for both the artificial and the spiking CNNs, but the inputs into the SNN are rate encoded into spike trains, and the weights from the trained ANN are then used in the spiking CNN after some approximations (Tavanaei et al.).

This conversion takes advantage of the SNN’s power savings if the model is then deployed on specialized hardware. However, approximating the ANN’s parameters decreases performance in the spiking model. Additionally, since the ANN is used for training, the learning procedure is still not biologically plausible. Can we do better by leveraging training SNNs directly?

Local learning using STDP

Yes! We can train a SNN using the biologically plausible STDP algorithm from Section 2.

In a spiking CNN, layers of excitatory and inhibitory neurons are set up in order to enable STDP. Each neuron is like a LIF model, so it fires a spike once the voltage threshold is met. Each neuron’s internal voltage is updated by:

where Vi (the internal potential at the i-th neuron) is its internal potential at t–1 adjusted by the sum of the weights between the j-th presynaptic neuron and the i-th neuron. Simply, Sj(t–1) is 1 if the j-th neuron fires, and 0 if it doesn’t fire.

The weights between the neurons are adjusted by STDP, the general formula for which is:

where A+ is the learning rate for potentiation (when the weight is increased), A- is the learning rate for depression (when the weight is decreased), and T are time constants within the learning window.

A simplified version of STDP can be used to adjust the weight W which further saves computation:

At the start of training, weights in the spiking CNN are random. Competition is used to drive different groups of neurons (called neuron maps) to learn different features. When one neuron in an excitatory layer of neurons spikes first, a “winner takes all” strategy is used so that the inhibitory layer of neurons prevents neurons from another map that has the same receptive field of the input from firing. The winning neuron’s weight is updated using the STDP algorithm. This new weight is shared across the neurons in the neuron map for efficiency. The competition between maps allows for different features to be learned by different neuron maps, in each convolution layer. The last layer is a fully connected layer that classifies spike train outputs.

Performance

SNN models have been shown to achieve low test errors when tested on MNIST, the classification dataset of black-and-white handwritten digits, and CIFAR-10, a classification dataset of 32x32 color images with 10 classes. Training a SNN end-to-end with STDP works well, but training by converting a trained ANN to an SNN achieves higher accuracy.

Neuromorphic Hardware

As mentioned previously, SNNs are attractive not only because they are more biologically plausible, but also because they are more power efficient when implemented on special hardware called neuromorphic hardware. Neuromorphic hardware has a large number of processors (sometimes on the order of thousands of cores), and each processor has its own clock and local access to memory. Since neuromorphic hardware is much less virtualized than traditional hardware, it is very difficult to train models on neuromorphic hardware as the weight updates have to be realized on the hardware itself. In practice, SNNs are often trained on regular hardware and then adapted to neuromorphic hardware to gain performance benefits during inference (Pfeiffer and Pfeil).

Pros and Cons of SNNs

When implemented on neuromorphic hardware, SNNs are significantly more energy efficient because of the exclusion of matrix multiplications when updating weights during learning. Spike trains are sparse signals, so a neuron’s output is often 0. Energy efficiency is a concern in the area of hardware for deep learning models as datasets and model sizes continue to grow at an exponential rate. If SNNs can be improved to perform well on datasets at scale, their computational efficiency would be a promising alternative to ANNs.

However, SNNs’ performance at scale has not yet been proven. Local learning algorithms using STDP are biologically plausible, and STPD-trained SNNs have performed well on small classification datasets such as MNIST and CIFAR-10. However, they have not performed well on larger datasets such as ImageNet.

Another benefit of SNNs, besides computational efficiency, is biological plausibility. Since neurons in the brain generate and communicate using spikes, it is natural to wonder whether creating a similar system in silicon can lead to more efficient, more effective machine learning. We see from current research that, in many cases, a biologically implausible element must be introduced to decrease error rate, like converting training an ANN using backpropagation and then converting to a SNN (Tavanaei et al.). Even so, spike trains also have the ability to incorporate spatio-temporal information innately, which offers another tempting advantage for representation of real-world streaming input in machine learning.

Finally, SNNs require specialized neuromorphic hardware. So, in order to yield the theoretical computational efficiency, the correct hardware must be available. Companies like Intel are actively investing in production of smaller and faster neuromorphic devices, like the recently announced Loihi 2 chip.

In theory, SNNs have many benefits over ANNs: they are biologically plausible, can be much more computationally efficient, and can perform asynchronous computation. However, in practice, most of these benefits are unrealized. The best-performing SNNs are created by training an ANN using the backpropagation, which is biologically implausible, then converting to a SNN. Neuromorphic hardware, which is necessary to realize the energy efficiency gains of SNNs, are incredibly specialized and not yet widely adopted. The potential of SNNs is exciting to many, but more work remains to be done to outperform traditional ANNs.

This article was written by Sam Acquaviva, Jesse Cummings, and Nicole Pang as part of the Fall 2021 MIT class, 6.881: Tissue vs Silicon in Machine Learning.

Bibliography

Brette R (2015) Philosophy of the Spike: Rate-Based vs. Spike-Based Theories of the Brain. Front. Syst. Neurosci. 9:151. doi: 10.3389/fnsys.2015.00151

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., … & Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.

Brunel, Nicolas & van Rossum, Mark. (2008). Lapicque’s 1907 paper: From frogs to integrate-and-fire. Biological cybernetics. 97. 337–9. 10.1007/s00422–007–0190–0.

Cao, Y., Chen, Y. & Khosla, D. Spiking Deep Convolutional Neural Networks for Energy-Efficient Object Recognition. Int J Comput Vis 113, 54–66 (2015). https://doi.org/10.1007/s11263-014-0788-3.

Chen I, Lui F. Neuroanatomy, Neuron Action Potential. [Updated 2021 Aug 11]. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2021 Jan-. Available from: https://www.ncbi.nlm.nih.gov/books/NBK546639/

Chollet, F. (2019). On the Measure of Intelligence. ArXiv, abs/1911.01547.

Hodgkin, A. L., & Huxley, A. F. (1952). A quantitative description of membrane current and its application to conduction and excitation in nerve. The Journal of physiology, 117(4), 500–544. https://doi.org/10.1113/jphysiol.1952.sp004764

Illing, B., Gerstner, W., & Brea, J. (2019). Biologically plausible deep learning — but how far can we go with shallow networks? Neural networks : the official journal of the International Neural Network Society, 118, 90–101 .

Kheradpisheh, S., Ganjtabesh, M., Thorpe, S., Masquelier, T., STDP-based spiking deep convolutional neural networks for object recognition, Neural Networks, Volume 99, 2018, Pages 56–67, ISSN 0893–6080, https://doi.org/10.1016/j.neunet.2017.12.005.

Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. InAdvances in Neural Information Processing Systems 25. Curran Associates, Inc.

Lake, B.M., Ullman, T.D., Tenenbaum, J.B., & Gershman, S.J. (2016). Building machines that learn and think like people. Behavioral and Brain Sciences, 40.

Legenstein, R., Pecevski, D., Maass W., A Learning Theory for Reward-Modulated Spike-Timing-Dependent Plasticity with Application to Biofeedback, 2008, PLOS Computational Biology 4(10): e1000180. https://doi.org/10.1371/journal.pcbi.1000180.

Liao, Q., Leibo, J., Poggio, T., How important is weight symmetry in backpropagation? In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI’16), 2016,AAAI Press, 1837–1844. https://dl.acm.org/doi/10.5555/3016100.3016156.

Masquelier T., Thorpe S.J. (2007) Unsupervised Learning of Visual Features through Spike Timing Dependent Plasticity. PLOS Computational Biology 3(2): e31. https://doi.org/10.1371/journal.pcbi.0030031.

Pfeiffer, M., & Pfeil, T. (2018). Deep learning with spiking neurons: opportunities and challenges. Frontiers in neuroscience, 12, 774.

Shears, Osaze & Yazdani, Ahmadhossein. (2020). Spiking Neural Networks for Image Classification. 10.13140/RG.2.2.27001.80486.

Tavanaei, A., Ghodrati, M., Kheradpisheh, S., Masquelier, T., Maida, A., Deep learning in spiking neural networks, Neural Networks, Volume 111, 2019, Pages 47–63, ISSN 0893–6080, https://doi.org/10.1016/j.neunet.2018.12.002.

Thompson, N.C., Greenewald, K.H., Lee, K., & Manso, G.F. (2020). The Computational Limits of Deep Learning. ArXiv, abs/2007.05558.

Whittington JCR, Bogacz R. Theories of Error Back-Propagation in the Brain. Trends Cogn Sci. 2019 Mar;23(3):235–250. doi: 10.1016/j.tics.2018.12.005. Epub 2019 Jan 28. PMID: 30704969; PMCID: PMC6382460.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store