Playing with Backpropagation Algorithm Intuition

Published in

Secure and Private AI Writing Challenge

6 min readJun 16, 2019

Okay so , before we get into technicality of the algorithm let’s imagine there are a bunch of students studying for exam as shown below.

(image credits : Copyright 2007, Mike Watson Images Limited)

Now let’s see when they go to give their this semester’s first exam which is on this coming monday !

But guess what , they didn’t performed well in the exam after lots of rigorous studies! Now they start discussing the reason of their failure.

What do you Think could be reasons for their failure even after lots of cramming , group discussions, spending nights in library etc…

Let’s think together. So ,what were the variables involved here -

Maybe they could not get time for studying all things because they started late preparation ,
maybe they were not studying real things and just spend too much time on irrelevant things ,
maybe the library was the not the best place for them to study ,
maybe the environment there was not so cool and they were distracted too much

and what not. There could be any of them or every one of them. So , what did they do for their coming next exams ? One of the smart guy in them proposed a idea for changing their timings for studying and changing environment such as very silent place (maybe a corner one) etc..

You see what they have done , they learnt from their mistakes and changed some of their variables involved in process of preparation and finally they performed better in relative to earlier exams.

This step of learning from your mistakes and then changing some of your variables involved in the process of giving output and then keep repeating it until you get desired precision (or less error rate) is called backpropgation algorithm (in layman terms).

According to wikipedia -

[Backpropagation algorithms are a family of methods used to efficiently train artificial neural networks (ANNs) following a gradient descent approach that exploits the chain rule. The main feature of backpropagation is its iterative, recursive and efficient method for calculating the weights updates to improve in the network until it is able to perform the task for which it is being trained.]

Let’s understand the mathematical intuition behind this. For that , first let’s understand “feedforward” in neural networking.

All the calculations below assumes that you already know what neural network is , what gradient descent algorithm is and what sigmoid activation function is. (if not then see description below to check out articles on that).

[This is a neural network consisting of one input layer , 1 hidden layer and 1 output layer] [notations : for weight — w[i][j]^(K); i-> neuron of outgoing layer, j ->neuron of incoming layer, k -> layer 1 or 2 ; x1 and x2 are input neurons , h1 and h2 are hidden neurons of hidden layer , o is output neuron and 1 signifies bias]

Now let’s see layer by layer , how the neural network is going to compute the outputs for this particular architecture.

Firstly let’s what happens in first layer (from input to hidden layer)-

output from layer1 = sigmoid(w ¹* (x) + b ¹) where w ¹ signifies weight matrix from first layer

Now , output from layer2 = sigmoid (w² * (x) + b²) where w² signifies weight matrix from second layer and b signifies bias.

Hence , overall the output is composition of functions like represented below :

Code for above will be something like —

So , we have computed our outputs for neural networks through feedforward algorithm and it’s just called “feedforward” because the values are flowing in straight forward direction , there is no loop or moving backwards in any part of network.

Now , we wanna check that we are going in correct direction. So , what do you do when you wanna check that you have solved a particular question right or wrong ? You compare your solution with exact solution known to you.

We will do exactly that. For that we will simply compare out predicted values with correct values (known to us) and calculate the error (here , we use cross entropy loss function / error function).

Now , we put those values calculated above with known values in error function and calculate the error. Next we might wanna at this point of time , know that how to reach the correct values given that we know we have got wrong values. So , for that we know that we have to change our variables , here weights of a neural networks in such a way that the error rate is reduced.

So, let’s do that.

For that we need to apply gradient descent algorithm and for that we need derivatives of error function w.r.t to weights.

[Taken from intro to deep learning course on udacity]

Before that we must understand the fact that how do we calculate derivatives of composite functions (function of function) :

We need to take all the derivatives from the last function all the way till first moving backwards.

Now it’s time for calculating one of the derivative :

where

Let’s understand intuitively the formula —

When we calculate gradient of error function , that means that we are interested in finding how sensitive the error function is with respect to change in weights. For that we understand that when we change weight w11 by small amount this changes h1 by small amount , which changes h and finally which changes output by small amount.

and apply gradient descent -

So , basically it updates all the weights matrix layer by layer from last layer to first layer , updating only those which are responsible for producing correct results.

So , let’s suppose only the weights needed to be changed are :

w11 ² , w11 ¹ , w12 ¹ , w22 ¹ leaving other weights unchanged.

We keep on repeating these steps moving forward and backward through the neural networks in each epoch (for entire dataset once) until we get desired precision or required loss rate.

This completes our backpropogation algorithm.

Give yourself a treat if you reached till here because you should have understood the crux of backpropogation.

— — — — — — — — — — — — — — — — — — -

For more details of these concepts refer-

for gradient descent intuition:

Playing with Gradient Descent Intuition

Before going to the topic let’s understand some basics and common sense questions ?

medium.com

— — — — — — — — — — — — — — — — — — — — — — — -

images : copyright reserved with their respective owners.

For more such awesome stories , you can subscribe or follow me.

Feel free to share your insights on the backpropogation in the comments section below.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

sourav kumar - Udacity Facebook Scholar| LinkedIn

www.linkedin.com

souravs17031999 - Overview

PyTorch UDACITY scholar | SPACE ENTHUSIASTS | FAN OF SCIENCE OF NATURE | CODING IN PYTHON , C++, HTML , CSS …

github.com