A Deep Dive into the Functionality of Artificial Vs. Biological Neural Networks

Communication, Noise Processing & Architectural Changes

Ambar Kleinbort
Analytics Vidhya
9 min readNov 1, 2019

--

Highlights:

Biological neural networks are made of oscillators — this gives them the ability to filter inputs and to resonate with noise. It also gives them the ability to retain hidden firing patterns.

Artificial neural networks are time-independent and cannot filter their inputs. They retain fixed and apparent (but black-boxy) firing patterns after training.

The interactions of different types of noise with different elements of artificial neural networks is still a wide-open research topic. We know noise can be used to regularize their performance or to help avoid local minimums in gradient descent.

The brain’s ability to restructure its connections is called synaptic plasticity, and it is achieved through Long-Term Potentiation. Artificial neural networks determine their connectivity using gradient descent during training. The gradient is calculated using backpropagation, which we can compare to Hebbian Long-Term Potentiation.

Introduction

The mind is what the brain does. So then, what exactly is the brain doing and what can we build that does something similar? How exactly do these things do what they do?

In this article, we hone into the functionality of biological and artificial neural networks to answer these questions. We look at how neurons and nodes communicate, at how they process noise and at the way networks change as they learn — ultimately leading us to the appropriate choices and behaviors.

If these topics are new to you look here for an introduction to the structural components of these networks:

Biological Neural Networks

Ultimately, the output of any network depends on the activation pattern of its nodes. At a basic level, we can say a neuron holds an electric potential and will fire if a certain electrical threshold is met. Neurons communicate via the synapse, a gap between cells where signals become chemical. Some chemicals will tell the postsynaptic neuron to fire, some will tell it not to fire. Whether the following neuron fires or not will ultimately be determined by its electric potential.

Artificial Neural Networks

A node, on the other hand, holds a numerical value called ‘activation’ — the larger this number is, the stronger the effect of this neuron on the following one. Typically, we use a sigmoid function to keep activation values between 0 and 1, and we have a bias term that shifts the activation function left or right. Usually, nodes are organized in layers, where all the nodes in a layer are connected to each node in the previous layer. ‘Weights’ determine the strength of the connections, and just like the chemicals, they can send a positive (fire) or negative (don’t fire) message to the following node.

Communication: Thresholds and Timing

We will be looking very closely at how weights and synapses change as learning occurs, but first, what does an effective connection look like? Basically, artificial neural network weights don’t have to worry about being ignored based on their timing, while synaptic transmissions do. As we will see, the reasons why have far-reaching implications for information processing.

Until the end of the 20th century, neurons were believed to simply add up the electrical inputs they received from all neurons around them and fire if this sum was above -70 mVolts. Today, we know that neurons hold energy as a weak chaotic oscillator. This means that they have an orbit attractor, so rather than taking the exact same path every oscillation they take approximately the same path. More specifically, neurons are nonharmonic oscillators, meaning the speed they oscillate at is not constant. These are also called relaxed oscillators because they are characterized by a relaxed accumulation of energy that periodically turns into a sudden loss of it (like an action potential).

Let’s take György Buzsáki’s household example of a relaxed oscillator so we can break this down: the dripping faucet. A relaxed oscillator has 3 phases: the excitable state, the active state and the refractory period. The first corresponds to a faucet accumulating a drop of water, while if you hit it or produce some kind of input, the water will drop (but the droplet could have been bigger if you had waited for more of it to accumulate). Then, during the active phase, the droplet will fall by itself. As the water falls, and right after in the refractory period, when there is no/little water, the system cannot be disturbed.

Neurons are like the faucet, immune to synaptic activity during the activation and refractory periods. This implies relaxed oscillators alternate between information transfer and receiving periods, where the length of these periods is determined by the frequency of the oscillator.

This phase-reset property gives networks of oscillators the ability to learn and store patterns. Artificial neural networks also store connectivity patterns, but they are completely detached from time (besides us trying to optimize how long they run for). Here, all nodes receive information from all the nodes in the previous layer, and the best we can do is adjust the weights and bias term to keep activation at 0. Neurons combine high-pass (voltage-dependent) and low-pass(time-dependent) filtering to perform input selection based on frequency. In reality, each part of a neuron (i.e. dendrites, axon, etc.) can function as this sort of filter-capable resonator-oscillator.

Why does this matter so much? Besides the evident information processing differences caused by input selection, this is a deterministic factor in how networks process noise.

Noise: The Bias-Variance Trade-Off and Stochastic Resonance

Overfitting is the Achilles Heel of artificial neural networks, and noise can help us with this. Overfitting occurs when a machine learning model has low bias (it captures the relationship between the variables) and high variance (it has a highly variable quality of fit between data sets). This is problematic because it makes our model work poorly or unreliably on new data. The process of decreasing overfitting, where bias goes up and variance down, is called regularization.

Adding noise during training is a great way to regularize artificial neural networks. The usual suspect is Gaussian noise (i.e. Random noise), where this is also considered a data augmentation technique if applied to the inputs. We can also add noise to weights, gradients (which we will explore further later) and activation functions. The mechanisms through which noise in these different elements improves performance have not been precisely pinned down yet, but we can look to the brain for inspiration.

In neuroscience, we can think of noise in terms of Stochastic Resonance. Here, the oscillatory nature of neurons makes it clear that noise doesn’t always interfere with signals. Rather, it’s capable of amplifying hidden or ‘learned’ signals by pushing neurons over the activation threshold at the right time. This is known as resonance, and is what happens when energy is fed into a system at its natural frequency. Rhythms of the Brain compares this to standing by a piano — its cords vibrate when force is exerted to the floor near it. Good instruments do this because they amplify sound, or in the case of the brain signals, they resonate. If not through resonance, how or if signals are amplified in by noise in artificial intelligence remains to be seen. For now, let’s take a closer look at what we mean by gradient.

Architectural Changes: Gradient Descent/Backpropagation and Long-Term Potentiation

Neural networks alter their connections as they learn. This leads to more efficient and more accurate processing of their inputs, taking us to the right outputs, like whether an image is a cat or a dog. Artificial neural networks do this for a fixed amount of inputs — the training data, while the brain does this continuously, but how do they do it?

In an artificial neural network, connections cannot appear or disappear like in the brain, they just get stronger or weaker. At the end of the day, the goal is to minimize the error in the outputs of our network, which can be measured using the cost function. This cost function is simply the average of the sum of the squared errors between the answers our model predicts and the actual answers. This takes us to gradient descent — we call it descent because the objective is for error to drop. The word gradient refers to how we look for this minimum error — using derivatives to find the direction of the steepest descent. We can picture this as a ball rolling in a valley, it will go downhill till the lowest point, where the surface is flat (i.e. the derivative is zero). In a 2D graph, we would move to the right if the slope were negative, and to the left if the slope were positive. This comes with some challenges, mainly speed and finding the true minimum, rather than getting stuck between two peeks in a valley when there is a surface further down somewhere nearby. In some networks, gradient noise not only helps prevent overfitting, it also lowers training loss, and in others, it helps avoid local minimums.

The gradient tells us how to minimize cost, and backpropagation is how this gets computed. Picture a neural network where the output layer has 2 neurons, one classifies images as a sloth, the other one classifies images as a panda. The neuron with the higher activation will be what the network predicts — so what happens when this prediction is wrong? We cannot alter activation directly, but we can change the weights of the neurons from the previous layer.

Let’s say that the correct answer for an image is panda, and that our panda-node is only at 0.2 activation when it should be 1. We can drive this number up by giving the positive connections more weight and negative connections less weight. Since nodes from the previous layer with larger activation values have a stronger effect, changing the weight of these nodes will have more of an impact on the loss function (whether the weight is positive or negative). This can be compared to Hebbian learning in neuroscience, where connections are strengthened most between neurons that fire together.

At the same time, we can try to give nodes with heavy connections to our panda-node a higher activation value, but as we’ve established, we cannot change activation values directly. However, we can adjust the weights coming to this node from the second-to-last layer in order to achieve this. In fact, we do, that’s why we call this backpropagation. Now, going back to our output, the sloth-node will have its own opinions on how each layer of weights should change, so backpropagation takes an average in order to minimize loss for both of them (in real life this is done for a subset of the training data, also called a mini-batch, resulting in stochastic gradient descent).

On the other hand, the ability of the brain to alter its connections is called synaptic plasticity, and one of the mechanisms it does this through is called Long-Term Potentiation (LTP). Here, synapses are strengthened/weakened based on recent activity, which is regarded as a basis for learning and memory. There are several types of LTPs with different properties, including Hebbian-LTP, where ‘neurons that fire together wire together’ just like in an artificial neural network. However, there is also non-Hebbian-LTP, where it is not necessary for the pre and post-synaptic neurons to fire together, and anti-Hebbian-LTP, where the post-synaptic neuron must be hyperpolarized.

By far the best-understood example of LTP is the NMDA receptor-dependent LTP in the adult CA1 hippocampus. Using this as an example, we can explore some of the most common properties. Firstly, this LTP is Input Specific, this means that LTP induction at one synapse will not alter other synapses. Secondly, there is Associativity. This means that when activity at one pathway is not strong enough to induce LTP, strong simultaneous activity at another pathway is capable of inducing LTP in both. Lastly, we have Persistence, which points to the fact that this potentiation is long-term, lasting from minutes to months. The main mechanism behind this phenomenon that we have identified so far is the modification of dendritic spines, which are tiny projections where signals are exchanged.

As we come closer to understanding LTP, neurons, and synapses, we come closer to understanding the mind. We are approaching our potential to build something as complex as ourselves, with artificial neural networks being the furthest we have come.

--

--