Back propagation in Neural Networks

Pvreddy
IBM Data Science in Practice
3 min readDec 10, 2020

--

When I started going through Neural Networks, I thought that Back prop is some kind of magic, optimizing the parameters of the Neural Networks. Out of curiosity i did some research and found out the internal working of Back prop. It is really cool and simple and i hope you will feel the same after reading this article.

What is Back prop and Why we need to implement?

So in simple words we can think it as an algorithm used to find the derivatives of the parameters in order to optimize them and in turn they minimize the Cost/Loss.

I will derive the back prop for single input in a 2 layer Neural Network. If interested you can follow the same process and try to implement for multiple inputs.

2 layer Neural Network

The above two Networks are similar. For better understanding i have added the second Network.

x1,x2,x3 are the features of the single input

notations we follow….

X- Input

Z[1] -Linear function in layer1

A[1]- activation o/p in layer1

Z[2]- Linear function in layer2

A[2]-Predicted o/p by the Neural Network

n[l]- It represents hidden layer

W[1]- Weights of layer 1

W[2]-Weights of layer2

b[1]- Bias of layer 1

b[2]- Bias of layer 2

W[1] dimension -3*3

b[l] dimension -3*1

W[2] dimension -1*3

b[2] dimension — 1*1

General formula to find the dimension of W[l] -n[l] * n[l-1]

General formula to find the dimension of b[l]-n[l] *1

L(A[2],Y) represents Loss function.

Basic approach to train the Neural Networks:

Randomly initialize all the parameters

Perform Forward prop (Note : Store the results in cache to perform back prop effectively)

Compute Loss

Perform Back prop

Update parameters

repeat the process till the Loss is optimized.

Back prop computing the derivatives
Step by Step derivation of back prop
Step by Step derivation of back prop
end of back prop

As mentioned in the above proof we need to iterate the process for each and every input in training data till we get the optimal values for parameters in order to optimize the Loss function.

I hope this article gives you some basic understanding on internal working of Back prop.

I am not sure how to write the differentiation notations using keyboard so, i have derived the whole Back prop in sheet of paper and copied them here.

I am not good with drawings…don’t mind that

Cheers……

--

--