Gradient descent with momentum will always work much faster than the algorithm Standard Gradient Descent. The basic idea of Gradient Descent with momentum is to calculate the exponentially weighted average of your gradients and then use that gradient instead to update your weights.It functions faster than the regular algorithm for the gradient descent.

How it works ?

Consider an example where we are trying to optimize a cost function that has contours like the one below and the red dot denotes the local optima (minimum) location.

We start gradient descent from point ‘A’ and we through end up at point ‘B’ after one iteration of gradient descent, the other side of the ellipse. Then another phase of downward gradient can end at ‘C’ level. With through iteration of gradient descent, with oscillations up and down, we step towards the local optima. If we use higher learning rate then the frequency of the vertical oscillation would be greater.This vertical oscillation therefore slows our gradient descent and prevents us from using a much higher learning rate.

By using the exponentially weighted average dW and db values, we tend to average the oscillations in the vertical direction closer to zero as they are in both (positive and negative) directions. Whereas all the derivatives point to the right of the horizontal direction in the horizontal direction, the average in the horizontal direction will still be quite large. It enables our algorithm to take a straighter forward path to local optima and to damp out vertical oscillations. Because of this the algorithm will end up with a few iterations at local optima.

Implementation

We use dW and db to update our parameters W and b during the backward propagation as follows:

W = W — learning rate * dW

b = b — learning rate * db

In momentum we take the exponentially weighted averages of dW and db, instead of using dW and db independently for each epoch.

VdW = β * VdW + (1 — β) * dW

Vdb = β * Vdb + (1 — β) *db

Where beta ‘β’ is a different hyperparameter called momentum, ranging from 0 to 1. To calculate the new weighted average, it sets the weight between the average of previous values and the current value.

We’ll update our parameters after calculating the exponentially weighted averages.

W = W — learning rate * VdW

b = b — learning rate * Vdb

How to choose Beta?

• The momentum (beta) must be higher to smooth out the update because we give more weight to the past gradients.
• Using the default value for β = 0.9 is suggested but can be tuned between 0.8 to 0.999 if needed.
• Momentum takes into account past gradients so as to smooth down gradient measures. It can be implemented with descent by batch gradient, descent by mini-batch gradient or descent by stochastic gradient.

References:

Deep Learning Specialization by Andrew Ng

## Optimization Algorithms for Deep Neural Networks

These are some algorithms that can be used for improving your deep neural networks.

## Optimization Algorithms for Deep Neural Networks

In this article, I’ll present you with the most sophisticated Deep Learning optimization algorithms that allow neural networks to learn more quickly and achieve better results.

Written by

## Bibek Shah Shankhar

I post articles on Data Science | Machine Learning | Deep Learning . Connect with me on Linkedln: https://www.linkedin.com/in/bibek-shah-shankhar/ ## Optimization Algorithms for Deep Neural Networks

In this article, I’ll present you with the most sophisticated Deep Learning optimization algorithms that allow neural networks to learn more quickly and achieve better results.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Start a blog