Concept of Backpropagation in Neural Network

Abhishek Kumar Pandey
backpropagation
Published in
5 min readOct 31, 2019

The concept of a Perceptron:

The first and simplest type of neural network is called perceptron. A perceptron is a simple model of a biological neuron in an artificial neural network.The algorithm was the first step planned for a machine implementation for image recognition.

Biological perceptron, Right: A simple ANN

Artificial Neural Network was not able to learn correctly so the scientist Geoffrey Hinton in 1980 invented the concept called backpropagation. Because of this backpropagation, neural networks become so efficient that most of the companies using this nowadays.

In the below neural network, the first layer is the input layer, the second layer in the hidden layer and the third layer is the output layer. All neurons in the input layer connected with each of the neurons in the hidden layer and some pre-processing happen then we get the output as per our problem statement if its binary classification as in our problem whether the tumor is malignant or not.

An input layer, Hidden Layer and Output Layer of a Neural Network

How does Neural Network work?

Forward Propagation

A simple Neural Network

Suppose our problem is of binary classification, where we have taken four inputs in the input layer represented by X0, X1, X2, and X3. Then it passes to the neuron before it passes to the neuron there are some weights assigned to each input represented by W0, W2, W3, and W4. Next, it passes through an activation function.

There are two operations happens in a hidden neuron, first we sum the product each input value with the corresponding weight and bias.

Suppose ‘Y’ is the output of hidden neuron then

Y = X0 W0 + X1 W1 + X2 W2 + X3 W3 + bi

Where bi = bias (it uses to prevent the output from zero or neutral), we use bias because sometimes sum the product each input value with the corresponding weight becomes zero. In that case, if we pass our Y value through any activation function, we will not get any result. So bias term will help Y in becoming non-zero.

Then it is pass through an activation function and suppose the output of the activation function is Z.

Z = Act(Y)

Suppose we apply the sigmoid activation function here, so this will output the value between 0 and 1. So if the value of Z will be less than 0.5 it means neurons will not get activated, if the value of Z is greater than 0.5 then the neuron gets activated. Here for understanding, we have taken only one neuron but in actual there will be many neurons and activation function should be applied on all the neurons in the hidden layer. This type of propagation is called forward propagation.

Back Propagation:

We have seen the structure of a neural network. There are weights and biases (together called the parameters) that need to be adjusted for activating the neuron and thereby a correct prediction by the neural network. For example, we have taken a simple neural network, there are three input features X1, X2, X3, and X4 each has assigned weight W1, W2, W3, and W4 respectively. Then it passes through a hidden layer (neuron) and the weight assign after the hidden layer is W5.

We consider it is binary classification problem so sigmoid is taken as activation function it will convert the output between 0 to 1. Suppose y is the actual value and is predicted the output. Next, we have to compare whether predicted y and actual y are almost same or not. If predicted y is different from actual y then our the loss function is

We will try to minimize this loss function.

Square makes loss function a +ve value. Suppose predicted y is 1 and actual y is 0, our neural network prediction is wrong and loss will be 1 so we will try to minimize this loss function by adjusting the weight W1, W2, W3, and W4. This weight adjustment should be done in such a way that predicted y and actual y should be same. We can minimize our loss function using optimizer

So here backpropagation comes in to picture, we will adjust weight in such a way that our loss function will be minimized in each iteration. So we will update our old weights with new weight in such a way that our loss function should be zero. In other words, predicted y should be equal to actual y.

We will represent the new weight by suffix new.

Gradient Descent

Gradient descent is one of the best optimizers we used in backpropagation. It updates the weight in such a way that our loss function will minimize in each iteration.

Error vs Training Steps

Suppose our loss is at dark black point in the fig, then we have to minimise the loss Unless we don’t reach the global minima the updation of the weight will keep going. The gradient (slope) at that point is a positive slope. So at each iteration our we will subtract the product of the learning rate and derivate of loss function wrt weight which is actually positive. So in every iteration weight will be reduced.

Gradient Descent Optimizer

On the other side suppose the loss is at this point the slope at that will be negative and as per the new weight formula, our new weight will keep increasing unless we are not getting global minima.

The reason behind very small value of the learning rate is f we take higher value then it may jump and sometime it may jump global minima point and it may never reach global minima.

There are some hyperparameter optimization techniques by which we used to select an appropriate learning rate. In most of the cases, the learning rate is taken as 0.001. So backpropagation and front propagation will run simultaneously until our cost function is not getting minimized. And this is called one epoch.

--

--

Abhishek Kumar Pandey
backpropagation

M.Tech, Cochin University of Science and Technology, ML and AI Enthusiast