Building a Neural Network from scratch in Python

Published in

Analytics Vidhya

6 min readMar 13, 2020

A way to understand the inner math behind the working of neural network

I always believe that to be master in any domain we should understand the core working area of that particular domain. In order to develop our own libraries we should understand the inner math behind the working of neural network.so I’ve decided to build a Neural Network from scratch without any deep learning libraries.

In this post we will go through the mathematics behind neural network and code from scratch in Python. we build a neural network with a variety of layers (Fully Connected). Eventually, we will be able to create networks in a modular fashion.

I’m assuming you already have some knowledge about neural networks. The purpose here is not to explain the neural network concepts or why we make these models, but to show how to make a proper implementation.

Components of a Neural Network

Neural Networks consist of the following components

An input layer, X
An arbitrary amount of hidden layers
An output layer, ŷ
A set of weights and biases between each layer, W and b
A choice of activation function for each hidden layer, A. In this tutorial, we’ll use a Sigmoid activation function.

The diagram below shows the architecture of a 2-layer Neural Network (note that the input layer is typically excluded when counting the number of layers in a Neural Network)

Building the parts of our algorithm

The main steps for building a Neural Network are:

Define the model structure (such as number of input features and hidden layers)
Initialize the model’s parameters.
Loop till the minimum value of cost function.

Calculate current loss (forward propagation)
Calculate current gradient (backward propagation)
Update parameters (gradient descent)

You often build 1–3 separately and integrate them into one function we call model()

Implementation

So, we now know the main ideas behind the neural networks. Let us start implementing these ideas into code.As I mentioned we are not going to use any of the deep learning libraries. So, we will mostly use numpy for performing mathematical computations efficiently.

import numpy as np

The first step in building our neural network will be defining our model structure(layer_dim) before initializing the parameters. We need to initialize two parameters for each of the neurons in each layer: 1) Weight and 2) Bias.

These weights and biases are declared in vectorized form. It means that instead of initializing weights and biases for each individual neuron in every single layer, we will create a vector (or a matrix) for weights and another one for biases, for each layer.(note that Vectorization Implementation can be seen as matrix operations which are often more efficient than standard loops)

These weights and bias vectors will be combined with the input to the layer. Then we will apply the sigmoid function over that combination and send that as the input to the next layer.

layer_dim holds the dimensions of each layer. We will pass these dimensions of layers to the init_parms function which will use them to initialize parameters. These parameters will be stored in a dictionary called params. So in the params dictionary params[‘W1’] will represent the weight matrix for layer 1.

Cool ! We have initialized the weights and biases and now we will define the sigmoid function. It will compute the value of the sigmoid function for any given value of Z and will also store this value as a cache. We will store cache values because we need them for implementing backpropagation. The Z here is the linear hypothesis.here sigmoid function is the activation function and different types of activation functions that can be used for better performance but we will stick to sigmoid for the sake of simplicity(note that Relu is the most widely used Activation function)

The job of an activation function is to shape the output of a neuron. Its purpose is to convert the linear outputs to non-linear outputs.

Forward Propagation

Now, we start writing code for forward propagation. We know that forward propagation will take the values from the previous layer and give it as input to the next layer. The function below will take the training data and parameters as inputs and will generate output for one layer and then it will feed that output to the next layer and so on.

A_prev is input to the first layer. We will loop through all the layers of the network and will compute the linear hypothesis. After that it will take the value of Z (linear hypothesis) and will give it to the sigmoid activation function. Cache values are stored along the way and are accumulated in caches. Finally, the function will return the value generated and the stored cache.

Now its time define our Cost function to check our model performance.

As the value of the cost function decreases, the performance of our model becomes better. The value of the cost function can be minimized by updating the values of the parameters of each of the layers in the neural network. Algorithms such as Gradient Descent are used to update these values in such a way that the cost function is minimized.

Backward propagation

Now that we’ve measured the error of our prediction (loss), we need to find a way to propagate the error back, and to update our weights and biases.In order to know the appropriate amount to adjust the weights and biases ,we need to know the derivative of the loss function with respect to the weights and biases.

The code above runs the backpropagation step for one single layer. It calculates the gradient values for sigmoid units of one layer using the cache values we stored previously. In the activation cache we have stored the value of Z for that layer. Using this value we will calculate the dZ, which is the derivative of the cost function with respect to the linear output of the given neuron.

now we can calculate dW, db and dA_prev, which are the derivatives of cost function with respect the weights, biases and previous activation respectively. I have directly used the formulae in the code. If you are not familiar with calculus then it might seem too complicated(note that the math around backpropagation is very complicated)

Now we will implement backpropagation for the entire neural network. The function backprop implements the code for that. Here, we have created a dictionary for mapping gradients to each layer. We will loop through the model in a backwards direction and compute the gradient.

Once, we have looped through all the layers and computed the gradients, we will store those values in the grads dictionary and return it.

Great!we have implemented backpropagation for all layers manually with numpy.

Update parameters

Finally, using these gradient values we will update the parameters for each layer. The function update_parameters goes through all the layers and updates the parameters and returns them.

Finally, it’s time to put it all together. We will create a function called model for training our neural network.

Finally Done! we built a neural network from scratch without using any deep learning libraries.

Conclusion

In this post i showed how to make proper implementation of a neural network from scratch(keeping in mind that you know the basics of neural networks).There are tons of resources in online to learn important parts of neural networks like vectorized implementation,Backward propagation,Gradient descent and calculus.Go and explore them.

Happy deep learning!