How to Build a Neural Network From Scratch

Anjaneya Tripathi
The Startup
Published in
6 min readNov 25, 2020

Is that a cat? Is it a dog? Well, let us create a neural network and find out! Neural networks have become an integral part of image classification, natural language processing, speech recognition and whatnot. This seemingly complex entity isn’t that complex, after all. Let us get down to the basics and build our very own neural network!

Introduction

The primary objective of neural networks is to replicate the human brain and take decisions and perform tasks just like we all do. Our nervous system consists of billions of neurons, and each neuron receives impulses from input sources like our eyes, ears, mouth etc. The impulse is processed, and the output is transmitted via the axon. This output serves as the input for the next neuron. This process is received multiple times after which a decision is made. As you can see below, parallelism is drawn between a biological and artificial neuron.

Source: ResearchGate

Let us take some input, maybe auditory and visual. The information from both these sensory organs is passed on to the brain. So how does the brain process it? Well, the data is passed to neurons (which form multiple layers of the neural network), and they are worked on (processed). In case the input data crosses a certain threshold, it is passed forward, and the neuron is said to be activated. This process is repeated for each and every layer of the neural network. The final layer generates an output and probably tells us we’ve seen a dog, cat or a mouse.

With that brief introduction, let’s start digging deeper and see what is happening under the hood.

Our Goal

We aim to create a neural network that will mimic the diagram below with a bias 1 for each layer.

Forward Propagation

We have 3 inputs, A, B and C. Let their contributions be w1, w2 and w3 respectively. These contributions are called weights and are usually different because each input need not be of equal importance. For example, while determining the price of houses, the locality and neighborhood may be more important than features such as the area of the house, number of balconies etc. We compute the sum of each input feature and send it to the neurons.

a = (w1*A + w2*B + w3*C) + b

Here, b is the bias term, and a is the resultant value we get from the input signals.

Getting Started

We can rewrite the above equation in a more concise format as matrices. Let X be the set of input features (A, B and C) and W1 the weight matrix. Let us define our input matrix, expected result, weights, and baises.

The dimension of X is [number of entries, number of input parameters] while that of W1 is [number of neurons in input layer, number of neurons in the hidden layer]. We then add a bias b1 having dimensions [1, number of neurons in the hidden layer].

The output of the first layer is computed as follows:

This is then sent to a neuron where it is operated on by an activation function. You may be wondering, why do we need an activation function?

Sigmoid Activation Function

As you can see, the above equation closely resembles linear regression. By introducing a non-linear activation function, we can make the neuron learn and perform more complex tasks. The activation function we’ll be using here is the sigmoid activation function.

Source: ResearchGate

Once this is computed, we proceed to compute the product of z1 and the second set of weights W2 which will serve as an input for the output layer.

Forward propagation is done, but is that it? No way! There is a huge chance that our initial weights may be incorrect. As a result, the predicted values that we obtain will not be satisfactory. So, what do we do? Let’s find out how wrong we are and tune the weights, shall we?

Computing Loss and Cost Function

It’s time to help our neural network learn from it’s mistakes and make it more accurate. We compute the loss of our model and calculate its cost. We have taken cost as the sum of square of the errors.

Now that we have calculated the cost, what do we do?

Gradient Descent

This is the whole crux of backward propagation. Our cost function (also called loss function) as you can see is parabolic, so we need to reach the minima, cause that is what will give us the best result. So how do we do it? We define a learning rate alpha and then we gradually move down the curve in the direction of the minima. How is this done mathematically?

Source: Simplilearn

We now update the weights! We declare a learning rate called ‘alpha’ which gives us a magnitude by which we need to move in the direction of the minima. If our learning rate is too high, we may shoot past the desired value and if our learning rate is too small, it will take us a long time to reach the desired value.

Jeremy Jordan

You may be wondering, what is that back propagation function? Let me tell you.

Backward Propagation

The real learning happens here, quite literally! We finally get to train our neural network and teach it to make better predictions.

As seen above, we have found out our cost function. It’s time to find out the contribution of the weights and tune it so that it will minimize the cost function. We find the partial derivative of J with respect to our weights.

We will use chain rule to compute dJ/dW.

Once we back propagate, we are back at the start. However, we have a better weight matrix to start with now and the next iteration will have more accurate results.

The Final Code

After many iterations, this is what we finally achieve!

Plotting cost (or loss) vs iterations

Conclusion

We have finally constructed our very own neural network from scratch in Python. You should be really proud of yourselves. However, there is a lot more to learn and this is just the beginning. So, how can we improve what we have done right now?

  1. Use different activation functions such as ReLU, hyperbolic, Leaky ReLU, softmax etc.
  2. Alternative optimization functions can be used to achieve the same/better results. A good article to get started with is:

Hope you had a great time learning about neural nets!

--

--