Understanding Backpropagation And Gradient Checking

shiv pratap rai
The Good Food Economy
6 min readJun 25, 2023

--

Introduction

A neural network is like a smart pattern recognizer that can make predictions based on real-world data. It learns by using a cool algorithm called backpropagation. This algorithm is all about fixing mistakes and making our predictions even more accurate. It does this by adjusting the weights of the neural network, kind of like fine-tuning an instrument. And guess what? The tag team of gradient descent and backpropagation algorithms work together to make this fine-tuning process happen. It’s like they’re working hand in hand to make our neural network super awesome!

When it comes to neural networks, the backpropagation step can be a tricky area where mistakes are more likely to happen. That’s why having a method to debug this step can be a real lifesaver and save you from headaches. Let me introduce you to gradient checking! It’s a neat technique that approximates the gradient using a numerical approach. Here’s the cool part: if the estimated gradient is close to the calculated gradients, it means that the backpropagation was implemented correctly! This can be a game-changer, making your debugging process much smoother and more efficient.

Let’s dive into more details and let’s see how it can be implemented in a project.

Part 1: Building the Neural Network

To create our awesome neural network, we need to bring in two crucial modules: forward propagation and backward propagation. Don’t worry, it’s not as complicated as it sounds! Let’s break it down:

  1. Forward propagation: This module is like the “going forward” part of our network. It takes in the input data, processes it through the layers of the neural network, and produces an output. It’s like going on a journey, where the data moves forward through the network, getting transformed and processed along the way.
  2. Backward propagation: Now, this module is all about learning from our mistakes. It’s like the “oops, let’s fix that” part of our network. Backward propagation works by calculating the gradients of the network’s parameters (weights and biases) based on the error between the predicted output and the actual output. This allows our network to learn and adjust its parameters to make better predictions in the future.

So, by implementing these two modules — we’ll be well on our way to building a powerful neural network. They work hand in hand to make our network learn, adapt, and improve its performance. It’s like having a dynamic duo working together to make our neural network shine!

Alright, it’s time to roll up our sleeves and have some fun with our very own dummy neural network! Let’s dive right in and bring this amazing creation to life. Get ready to witness the magic of neural networks in action!

Forward Propagation

Great! Now, let’s take a closer look at our awesome network. We’ve already set up the forward propagation module, and it’s ready to rock!

In this module, we have some input features: f1, f2, f3, f4, and f5. These features are like the building blocks of our network, providing the necessary information for making predictions.

We also have nine weights: w1, w2, w3, w4, w5, w6, w7, w8, and w9. These weights play a vital role in shaping how our network processes the input features and generates the final output.

Speaking of the final output, we have a special value called L. It’s calculated using a formula: (Y — Y’)². This formula helps us measure the difference between the predicted value (Y’) and the actual value (Y). It’s like evaluating how well our network is doing at making predictions.

In the code below, you can follow each step of our network’s forward propagation. It’s like a guided tour that shows how our network moves forward, processing the features and eventually generating the output.

Backward Propagation

After our network’s forward propagation journey, we’ve got some exciting things to work with. We have the computational graph’s output at each step and the loss.

Now, it’s time to switch gears and dive into the backpropagation phase. This is where the real magic happens! We need to calculate something called the gradient of each weight. Don’t worry, it’s not as daunting as it sounds.

To do this, we’ll calculate something called the partial differentiation of the loss with respect to each weight. It’s like teasing apart the impact of each weight on the overall loss.

By calculating these gradients, we’ll have a clear picture of how each weight influences the performance of our network. It’s like unraveling a puzzle and finding the missing pieces to make our network even better.

The below code will give a clear picture of the gradient calculation w.r.t each weight.

Part 2: Gradient Checking

Now that you’re familiar with both forward propagation and backward propagation, you’re armed with some powerful knowledge! In the backward_propagation() step, you’ve calculated the gradients for each weight, which is fantastic progress. But here’s the fun part: it’s time to put those gradients to the test and see if they’re correct!

Imagine it as a little game of detective work. We want to make sure our gradients are on point and leading us in the right direction. So, let’s check them out and ensure they’re accurate.

By verifying the correctness of our gradients, we can have peace of mind that our network is learning and adjusting in the right way. It’s like double-checking our answers to make sure we didn’t make any mistakes.

A bit of Calculus:

We know that the derivative of any function is

  • The definition above can be used as a numerical approximation of the derivative. Taking an epsilon small enough, the calculated approximation will have an error in the range of epsilon squared. In other words, if the epsilon is 0.001, the approximation will be off by 0.00001.

Let’s understand with a simple example: 𝑓(𝑤1,𝑤2,𝑥1,𝑥2)=𝑤21.𝑥1+𝑤2.𝑥2

from the above function, lets assume 𝑤1=1, 𝑤2=2, 𝑥1=3, 𝑥2=4 the gradient of 𝑓 w.r.t 𝑤1 is

let's calculate the approximate gradient of 𝑤1 as mentioned in the above formula and considering 𝜖=0.0001

Then, we apply the following formula for gradient check:

The equation above is basically the Euclidean distance normalized by the sum of the norm of the vectors. We use normalization in case one of the vectors is very small. As a value for epsilon, we usually opt for 1e-7. Therefore, if the gradient check returns a value less than 1e-7, then it means that backpropagation was implemented correctly. Otherwise, there is potentially a mistake in your implementation. If the value exceeds 1e-3, then you are sure that the code is not correct.

gradient_check = (6−5.999999999994898)/(6+5.999999999994898)

gradient_check= 4.2514140356330737𝑒−13

Code:

Congratulations! You’ve unlocked the power of gradient checking to debug your very own neural network. This is especially useful if you are building a neural network without the use of a framework. So keep up the great work, and let gradient checking be your secret weapon in the realm of neural network building!

In a future post, I will show how to optimize the loss of neural networks with the use of Optimizers.

References

--

--