# The Vanishing Gradient Problem

**Vanishing Gradient** Problem occurs when we try to train a Neural Network model using ** Gradient based optimization techniques**.

Vanishing Gradient Problem was actually a major problem 10 years back to train a Deep neural Network Model due to the long training process and the degraded accuracy of the Model.

What happens is that as we keep on adding more and more Hidden layers in The model , the learning speed of the next hidden layers in the model keep on getting faster and faster.

Generally, adding more ** hidden layers** tends to make the network able to learn more

**, and thus do a better job in predicting future outcomes. This is where**

*complex arbitrary functions***Deep Learning**is making a big difference due to the

*thousands and millions of*

**it has , we can now make sense of highly complicated data such as images , speeches , videos etc and do Speech Recognition and Image Classification , Image Captioning etc.**

*hidden layers***Now coming to the point- What is the Vanishing Gradient Problem?**

Now when we do ** Back-propagation i.e moving backward in the Network and **calculating

**gradients of loss(Error) with respect to the weights ,**the

**gradients**tends to get

*smaller**and smaller as we keep on moving*

**backward**in the Network. This means that the

**neurons**in the

**Earlier**

*layers learn very*

**as compared to the neurons in the later layers in the Hierarchy. The Earlier layers in the network are slowest to train.**

*slowly***Why Earlier layers of the Network are so important to us?**

**Earlier layers **in the Network are important because they are responsible to *learn and detecting the simple patterns* and are actually the **building blocks** of our Network. Obviously , if they give improper and **inaccurate** results , then how can we expect the next layers and the complete Network to perform nicely and produce accurate results.

**Now what harm does it do to our Model ?**

*The Training process takes too long and the Prediction Accuracy of the Model will decrease.*

*Hence This is all Vanishing Gradient problem does to our Neural Network Model. Just think of a Deep Neural Network Model which is highly complicated and has millions of layers in it, how problematic it can be to train such a deep Network and produce good and accurate results.*

This is the reason why we do not use **Sigmoid** and **Tanh **as** **Activation functions which causes **vanishing Gradient** Problems — as mentioned in my article.** Hence mostly nowadays we use RELU based activation functions in training a Deep Neural Network Model **to avoid such complications and improve the accuracy **.**

Hope you get the answer to your Question.

Here is an amazing video which explains it all about Vanishing Gradient Problem. Do watch it.