Batch Normalization and ReLU for solving Vanishing Gradients

Lavanya Gupta
Analytics Vidhya
Published in
7 min readApr 26, 2021

--

A logical and sequential roadmap to understanding the advanced concepts in training deep neural networks.

Agenda

We will break our discussion into 4 logical parts that build upon each other. For the best reading experience, please go through them sequentially:

1. What is Vanishing Gradient? Why is it a problem? Why does it happen?
2. What is Batch Normalization? How does it help in Vanishing Gradient?
3. How does ReLU help in Vanishing Gradient?
4. Batch Normalization for Internal Covariate Shift

Vanishing Gradient

1.1 What is vanishing gradient?

First, let’s understand what vanishing means:

Vanishing means that it goes towards 0 but will never really be 0.

Vanishing gradient refers to the fact that in deep neural networks, the backpropagated error signal (gradient) typically decreases exponentially as a function of the distance from the last layer.

In other words, the useful gradient information from the end of the network fails to reach the beginning of the network.

Source

--

--

Lavanya Gupta
Analytics Vidhya

Carnegie Mellon Grad | AWS ML Specialist | Instructor & Mentor for ML/Data Science