Deep Dive into Back Propagation (Part I )

Aung Kyaw Myint
Analytics Vidhya
Published in
4 min readDec 3, 2019

This article is the continual of in-depth explanation of feedforward process in neural network. Link for the article can be found here.

Now that we completed feedforward pass, received an ouput, and calculated the error, we are ready to go backwards in order to change our weights with a goal of decreasing network error. Going backwards from the output to input while changing the weights is a process we call back propagation which is essentially stochastic gradient descent computed using chain rule.

To be able to implement a basic neural network, one doesn’t really need a deep mathematical understanding as now we have open source tools. But to really understand how it works and to optimise our application, it’s always important to know the math.

Our goal is to find the set of weights that minimises the network error. We use an iterative process presenting the network with one input at a time from our training set. During the feedforward pass for each input, we calculate the network error. We can then use this error to slight change in the correct direction, each time reducing the error by just a bit. We continue to do so until we determine the error is small enough. Now let’s look at the gradient considerations.

Here you can find other good resources for understanding and tuning the Learning Rate:

Basically back propagation boils down to calculating the partial derivative of error, E with respect to each of the weights. And then adjusting the weights according to the calculated value of delta of Wij. These calculations are done for each layer.

OverFitting

When we minimise the network error using back propagation, we may either properly fit the model to the data or overfit. Generally speaking when we have a finite training set, there’s a risk of overfitting.

Overfitting means that our model will fit the training data too closely. In other words, we over trained the model or the network to fit the data. As a result, we unintentionally also model the noise or the random elements in our training set. If that happens, our model will not generalise well when tested on new inputs.

There are generally two main approaches to addressing the overfitting problem.

  1. Stop training process early
  2. Use of regularisation

1. Stopping Training process early

When we stop the training process early, we do that in the region where the network begins to overfit. By doing so, we reduce the degradation in the performance on the test set. It would be ideal if we knew precisely when we should stop the training process. One way to determine when to stop the training, is by carving a small dataset out of training set which we will call validation set. Assuming that the accuracy of validation set is similar to that of test set, we can use it to estimate when the training should stop. The drawback of this approach is we end up with fewer sample to train our model on, so our training set is smaller.

2. Use of Regularisation

Regularisation means that we impose a constraint on the training of the network such that better generalisation can be achieved. Dropout is a widely used regularisation scheme which helps in that manner.

Since partial derivatives are the key mathematical concept used in backpropagation, it’s important that you feel confident in your ability to calculate them. Once you know how to calculate basic derivatives, calculating partial derivatives is easy to understand.
For more information on partial derivatives use the following link

For calculation purposes in future quizzes of the lesson, you can use the following link as a reference for common derivatives.

Content Credit: Udacity Deep Learning Program

Link for Part II can be found here.

--

--