A brief history of Neural Networks, Generative AI & Spiking Neural Networks.
ChatGPT is amazing; it’s not just making us more productive, but it has also popularized AI. AI is the new electricity — it’s hyped, talking about it feels good, people are doing cool things with it, and if you want to get funding for your startups or apps, using AI might get you that funding quickly.
Large Language Models like ChatGPT are powerful. For example, in Portugal, the Center for Responsible AI https://centerforresponsible.ai/ has a whole list of AI products they are working on. One such product, Halo, is enabling ALS patients to talk again. Priberam is increasing doctors’ efficiency by extracting useful information from medical text, and Affine is helping businesses and individuals navigate complex legal requirements.
But how did we get here? It took us over seven decades to reach where we are today, and like every journey, we had ups and downs along the way. Let’s start with the 1940s and 1950s, which I like to call the era of curiosity. In 1943, McCulloch created the first- ever theory of the brain. In 1950, Alan Turing proposed the Turing test, and in 1958, Rosenblatt created the perceptron, a fundamental model in neural networks. From these inventions, it is clear that this was the time when researchers were trying to find a link between the brain, computational models, and artificial intelligence.
The perceptron created a hype in AI, something smaller but similar to what ChatGPT did today. But then came Minsky, who wrote the book “Perceptrons: An Introduction to Computational Geometry” and provided mathematical proof regarding the limitations of what perceptrons can do. Although he stated in his book that his critique did not apply to multilayer networks, this fact was somehow ignored, and interest in AI research was lost until 1986.
In 1986, a study titled “Learning Internal Representation by Error Propagation” was published. This study showed that we could adjust weights to minimize errors, enabling neural networks to learn complex representations and features. It gave us the ability to generalize from training data to new data. Unfortunately, Rosenblatt died in an accident in 1973 and was unable to see this groundbreaking achievement built on top of his work.
Neural networks are at the core of transformer models, which are at the core of large language models like ChatGPT. In 2017, a study titled “Attention is All You Need” was published, introducing transformers to the world. With transformer models, self-attention was introduced, allowing the model to weigh the importance of different words in a sentence. Positional encoding was used to handle the position of each word, and it could do all this with superior performance on various NLP tasks. This was truly a groundbreaking study in the field of NLP.
Finally, only at this point we had all the ingredients needed to build Generative AI models. Over the last few years, many models have been created, and their performance has improved exceptionally over time. Here is the timeline of the models: new models and improvements to existing models are happening so fast that it’s hard to keep track of them.
Generative AI has its own challenges;
- Bias and Fairness
- Misinformation
- Resource Intensive
- Scalability
- Consistency over longer text
- Hallucinations
- Copyright Infringements
- Economic impacts like job displacements
- Digital Divide
Here I am listing some of them — trust me, there are more. Bias and fairness are my favorite ones. I’m sure you all remember when the internet was flooded with social media posts about pictures like someone asking an AI to create images of the forefathers of America, and it would create a diverse set of characters.
One of the areas from this list challenges and I want to focus on is how resource- intensive models like ChatGPT are. For example, a Google query takes 0.0003 KWh, and one query on ChatGPT could take up to 1567% more energy — that is, 0.01 KWh. One image generation can use as much electricity as charging one iPhone. In a few years, powering AI could surpass the use of electricity by smaller countries like Ireland.
Researchers are working to find different solutions for this, which involve energy- efficient data centers, green and sustainable practices like using renewable energy, creating new chips, and optimizing hardware. My team at CISUC and I have a different approach. We want to create an efficient architecture and make algorithmic improvements to achieve this goal.
The key lies in biological neural systems. The human brain is very efficient and uses very little energy. Researchers are working on AI models called spiking neural networks to mimic these biological neurons and how they process information.
Let’s take a look at a biological neuron. It takes input at input points called synapses. The cell body, called the soma, generates a spike if it reaches its action potential. If it spikes, that spike is passed on to the next neuron through the axon. The spike here is an electric current, which a cell produces after allowing some of the salts around the brain to flow into the neuron. The next cell will only get activated if it receives enough input from the previous neurons to cause an action potential. This makes it super eOicient because all the neurons are not activated all the time.
Here are two neurons side by side: figure (a) is a neuron of a traditional neural network, and figure (b) is a neuron of a spiking neural network. Figure (a) is getting input from a continuous activation. A weight is applied to it, then we add and multiply those, add some bias, and use an activation function so that our output is in a certain range. It’s layer-by-layer multiplication and accumulation.
In figure (b), the spiking neural network receives spike trains. A “1” represents a spike, and “0” represents no spike. A weight is applied to these spikes. If the soma receives enough spikes from previous neurons and reaches the action potential, we call it a spike. We produce this behavior by training a model like the leaky integrate-and-fire model.
The benefits are clear: firing a spike is a binary event, while traditional neurons use continuous activation. These are also more biologically plausible — they are closer to how the biological brain works — and can be used efficiently on neurotrophic hardware, which is designed specifically for these types of neural networks.
Here arises a natural question: why not use SNNs to replace the fully connected neural networks used in transformers and make them efficient? Well, that’s mainly because of several challenges. For example, spiking neural networks are complex to train because of spike-time-dependent plasticity. If we have a bunch of neurons, the neurons that fire together wire together, and this adds a temporal dimension to these models. While I believe neuromorphic hardware and a model that runs on neuromorphic hardware would make it possible to run these models on devices like our computers and cellphones, this hardware is way more complex and sophisticated, with not many options available.
Some recent research works like SpikeGPT and Spikeformer propose a starting point for these studies and have shown amazing results in terms of energy efficiency. Hopefully, me and my research team will contribute and devise a biological inspired model that is efficient and overcomes these challenges.