Deep Learning from scratch: Neurons, Layers & Activations

Niels Verleysen
5 min readSep 12, 2022

--

In the past decade we have seen a huge change in our daily lives due to new technologies. Behind the scenes these changes are even more drastic, with whole factories and warehouses becoming almost completely automated. Who knows how much further this will go? With new developments like protein folding and self driving cars, it seems like the sky is the limit. A lot of these advancements can be traced back to a single technology, deep learning.

It’s clear that deep learning is a vital technology in our lives today and it will most likely remain so for some time. However, I see time and time again that people have a lot of misconceptions about what deep learning is. To help remedy this, I wanted to reveal the inner workings by coding it up from scratch. This being the first blog out of three, explaining the very basics of neural networks and how they make predictions. The following two parts will discuss how these networks learn and how you can improve the performance of your own networks by a few small changes.

The basics

I could start by explaining how neural networks are based on the neurons in our brain. Although that would be correct, it only further mystifies how these networks work in practice. A better approach is to start at a high level and then zoom in. In its essence a neural network as a unit is quite simple:

You start by passing an input to the front of the network. It then returns an output at the back. By comparing this output against the desired output the network can learn!

To describe this a bit more mathematically, a neural network is a function that maps a set of inputs to outputs. When learning about functions the examples you are shown are deterministic, like a sine wave or a straight line. But, what if you can’t define this relation between inputs and outputs by hand? That is where machine learning comes in. By showing examples of inputs with their desired outputs, these algorithms can learn the relation between the two. And one such machine learning algorithm is deep learning.

With this knowledge you now have a high level understanding of what deep learning is. Let’s zoom in to a lower level and look at the basic building blocks and what they do.

Neurons

As the name would suggest, a neural network is at its core a network of neurons, duh… So let’s first look at what a neuron is and then at how these networks are formed.

A neuron acts in the same way as a complete neural network. It receives one or more inputs from which it produces an output. The difference being that a neuron only has a single output, while a network can have many. To compute this output the neuron has a weight associated with each input and a bias. The output is then determined by multiplying each input with its weight and adding these up together with the bias. Note: this comes down to the dot product and then adding the bias to the result.

# Let's say our input contains 5 numbers
inputs = [4, 9, 2, 1.3, 5.6]
# The neuron therefore needs to have 5 weights and of course a bias
weights = [0.78, -0.34, 0.7, -0.2, -0.11]
bias = 1.3
# The output of the neuron can then be computed as follows:
output = weights[0] * inputs[0] + weights[1] * inputs[1] + \ weights[2] * inputs[2] + weights[3] * inputs[3] + weights[4] * \ inputs[4] + bias
# This is actually the same as doing a dot product of the input
# and the weights, and then adding the bias to it
output = np.dot(weights, inputs) + bias

Activations

A neuron by itself is linear. It doesn’t matter how many you combine, the network won’t be able to model a nonlinear relation. But, we want our neural networks to be able to model any relation. To achieve this we should add some nonlinearity, which is accomplished by adding an activation function. Activation functions come in many forms, but they are always nonlinear. Here are some examples:

From left to right the ReLU, Binary Step and Tanh activation functions

One of the most commonly used activation functions is the Rectified Linear Unit, or ReLU for short. This function is very simple and comes down to two rules:

After the neuron has computed an output as before, this number is passed through the activation function to get the activated output of the neuron. Putting everything together, the weights and bias basically change how the activation function looks like.

By changing the weights and biases you influence the input of the activation function. Intuitively you can see this as changing the behavior of the activation function for the same input.

With this idea in mind we can code up the ReLU activation and add it to our neuron from before.

# ReLU returns 0 if x is smaller than 0, and x otherwise
def
ReLU(x):
if x < 0:
return 0
else:
return x
# Or simply:
def
ReLU(x):
return np.maximum(0, x)

Layers

The relations that can be modeled by a single neuron are of course limited. A neuron also has only a single value as output. To alleviate this we combine multiple neurons together in a network. This is structurally done through layers.

A layer is a group of neurons that receive the same inputs. The idea is that each of these neurons can then focus on specific features of the relation to be modeled. Mathematically not much changes. The forward pass of a layer is the multiplication of the input matrix and the weights matrix where we then add up the bias matrix. In code we get something like this:

# We keep the same inputs as before
inputs = [4, 9, 2, 1.3, 5.6]
# Our layer has three neurons, so we need three sets
# of weights and three biases
weights = [
[0.78, -0.34, 0.7, -0.2, -0.11], # Same neuron as before
[-0.24, 1.24, 0.43, 0.04, -0.3],
[0.5, 0.65, -0.3, 0.64, 0.1]
]
biases = [1.3, -0.5, 2.4]
# The forward pass remains the same
output = np.dot(weights, inputs) + biases
# The activation also remains the same as the numpy maximum function # can deal with vectors as well
output = ReLU(output)

By combining multiple layers together, we add depth to the network. This way, neurons in later layers can combine outputs of earlier neurons, deepening the relationships that can be learned. Two layers can be combined by passing on all the outputs of one layer as inputs for each neuron of the next. In practice this comes down to using the output matrix of one layer as the input matrix of the next.

And with that we went through the core principles of what a neural network is and how a forward pass is performed. If you want to check out the code more in depth, you can find the repository here. In the next blog we’ll be checking out how the network learns from examples through something called backpropagation.

--

--

Niels Verleysen

Senio Data Scientist @ Verhaert, performing applied datascience research for digital & physical product development.