Understand the Math for Neural Networks

Detailed explanation for Gradient Descent and Back-propagation in math

Y Tech
Y Tech
Aug 16, 2019 · 4 min read

In this post, I will discuss details about Gradient Descent and Back-propagation in neural networks, and will help you to understand why there is Vanishing Gradient Problem. In my earlier post (How to implement Gradient Descent in Python), I discussed python implementation of Gradient Descent. In this post, I will show you step-by-step to understand the math behind the Neural Networks.

Assumptions

For the Neural Networks discussed in the post, we will make the following assumptions:

  • Sigmoid Function is used as the activation function
  • Cross Entropy is used as the error function:

Gradient Descent: The Math

Assume the following scenario, we need to classify m points into 2 groups, each point has p features, which is represented by (X1,X2,...,Xp). Say we are using one layer neural networks to do that.

We feed in point P1 (with features X1,X2,...,Xp), after the neuron we get the sum WX+b; We apply activation function over the sum, and we can get the probability for P1 in class 1 is ŷ11=S(WX+b), and probability for P1 in class 2 is ŷ12=1-S(WX+b).

Up to this point, everything is very straight forward.

The loss function we are using, say, is Cross-Entropy (as shown in the equation above). So our work now is trying to minimize the loss, and all we need to do is:

  1. Find the derivative of the loss function against weights
  2. With the derivative we found, we can apply Gradient Descent to update the weights with small steps to move towards the minimum error.

Now let’s start finding the derivatives of the loss function for this point P1 against weight Wj:

Similarly, we can also get the derivatives of the loss function against bias b:

So that we can get the Gradient of error:

Back Propagation: The Math

The Backpropagation logic is describe as below:

  1. Doing a feedforward operation.
  2. Comparing the output of the model with the desired output.
  3. Calculating the error.
  4. Running the feedforward operation backwards (backpropagation) to spread the error to each of the weights.
  5. Use this to update the weights, and get a better model.

Now let us try to do Backpropagation on the above two layer neural network. We will try to calculate the weight change steps for:

  • Weights between hidden layer and output layer
  • Weights between input layer and hidden layer

Let us first try to calculate the gradient descent step for W21 (The first weight of the layer between hidden layer and output layer):

Similarly, we can find the gradients of the other weights between these two layers:

Now Let us calculate the gradient descent step for weights between input layer and hiddent layer W11:

Similarly we can also get the gradients for the other weights between these two layers, I will not include them here.

However, there are a few important points we need to pay attention to:

  • As we can notice, the gradients between the input layer and hidden layer is much smaller than the ones between the top layers. It is the gradients in the top layers times weights times inputs times sigmoid derivatives.
  • As we all know, the sigmoid derivatives has a maximum value of 0.25:
  • So, this means, the gradients are reduced by at least 75%, this is called Vanishing Gradient Problem

I will prepare another post to discuss Vanishing Gradient in detail, and discuss the different techniques to deal with that.

The Startup

Medium's largest active publication, followed by +773K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store