Introduction to neural networks — weights, biases and activation

6 min readDec 27, 2021

How a neural network learns through weights, biases and activation functions

We often hear that artificial neural networks are representations of human brain neurons within a computer. These sets of neurons form interconnected networks, but their processes that trigger events and activations are quite different from that of a real brain.

A neuron, taken individually, is relatively useless, but when combined with hundreds or thousands of other neurons they form an interconnected network that often outperforms any other machine learning algorithm.

Brief historical background

The concept of a neural network is quite ancient — the first thoughts of modeling software taking inspiration from the human brain date back to the early 1940s, by Donald Hebb, McCulloch and Pitts. For over 20 years the concept remained at the level of theory. The training of neural networks was possible only through greater computation power and the creation of the backpropagation algorithm by Paul Werbos, an efficient mechanism that allows the network to learn by propagating the feedback of a neuron to the one that precedes it.

The work of Geoffrey Hinton, Andrew Ng and Jeff Dean who, together with other researchers, made the neural network paradigm popular and effective for a whole range of problems.

Today neural networks are used in a myriad of tasks thanks to their ability to solve problems previously considered impossible to solve such as language translation, video and audoi synthesis and autonomous driving.

Natural and artificial neuron — what are the differences?

While it is true that neural networks are inspired by natural neurons, this comparison is almost misleading as their anatomies and behaviors are different. I won’t go much into the neuroscientific aspect, but natural neurons seem to prefer activation based on a “switch”, an on or off state of activation. Following the period of activity, among other things, natural neurons exhibit a refractory period, i.e. where their ability to activate again is suppressed. This behavior is described in the concept of action potential.

General view of the anatomy of a neuron. Image by Author.

Anatomy of an artificial neuron. Image by Author.

Neural networks as “black boxes”

Neural networks are considered to be black boxes — we don’t know why they achieve this performance, but we do know how they do it.

The so-called dense layers, which are the most common layers in a neural network, create interconnections between the various layers of the network. Each neuron is connected to every other neuron of the next layer, which means that its output value becomes the input for the next neurons. Each connection between neurons has a weight which is one of the factors that is changed during training. The weight of the connection affects how much input is passed between neurons. This behavior follows the formula inputs * weights.

Once a neuron receives inputs from all the other neurons connected to it, a bias is added, a constant value that changes is added to the previous computation involving weight. Bias is also a factor that is tuned during training. A neural network is able to generalize and model a problem in the real world (which is nothing more than a mathematical function) thanks to the constant adjustment of weights and bias, which modulate the output and the input of each single neuron until the network does not approach an acceptable solution.

The output of a neuron is expressed by the formula
output = inputs * weights + bias

The adjustment of weights and biases is done in the hidden layers, which are the layers between the input layer and the output layer. They are called “hidden” because we do not see the adjustment behavior of weights and biases. This is why neural networks are black boxes.

How a neural network learns

What makes neural networks such a complex topic is the enormous amount of calculations that occur at both the network and single neuron level. Along with the weights and bias there are the activation functions which add further mathematical complexity but greatly influence the performance of a neural network.

Weights and Biases

Weights and bias can be interpreted as a system of knobs that we can manipulate to optimize our model — like when we try to tune our radio by turning the knobs to find the desired frequency. The main difference is that in a neural network, we have hundreds if not thousands of knobs to turn to achieve the final result.

Since weights and bias are parameters of the network, these will be subject to the change generated by the rotation of the imaginary knobs. Since the weights are multiplied with the input, they affect the magnitude of the latter. The bias, on the other hand, since it is added to the whole expression, will move the function in the dimensional plane. Let’s see some examples.

Recall that the formula is output = inputs * weights + bias

How the output of a neuron changes based on weight and bias. Image by Author.

As you can see, weights and biases impact the behavior of each artificial neuron, but they do it in a different way respectively. The weights are usually initialized randomly while the bias at 0.

‍The behavior of a neuron is also influenced by its activation function which, parallel to the action potential for a natural neuron, defines the activation conditions and relative values of the final output.

Activation Function

The topic of activation functions deserves a separate article, but here I will present a general overview. If you remember, I mentioned how a natural neuron has a switch activation. In computer / math jargon, we call this function a step function.

Behavior of a step function. Image by Author.

Following the formula

1 if x > 0; 0 if x ≤ 0

the step function allows the neuron to return 1 if the input is greater than 0 or 0 if the input is less than or equal to 0. This behavior simulates the behavior of a natural neuron and follows the formula

output = sum(inputs*weights) + bias

The step function is a very simple function, and in the AI field there is a tendency to use more complex activation functions, such as the rectified linear unit (ReLU) and SoftMax.