Adam Optimization Algorithm

Adam optimization algorithm is one of the unique algorithms that has really stood up and proven to be effective well across a wide variety of models in deep learning. Adam optimization algorithm essentially takes and ties together Momentum and RMSprop. Adams stands for Adaptive Moment Estimation.

How it works ?

  1. First, it calculates and stores an exponentially weighted average of past gradients in VdW & Vdb (before bias correction) and VdWcorrected & Vdbcorrected (with bias correction) variables.
  2. It then calculates an exponentially weighted average of past gradient squares and stores it in SdW & Sdb (before bias correction) and SdWcorrected & Sdbcorrected (with bias correction) variables.
  3. Finally, updates the parameters in a direction based on combining the “1” and “2” information

Implementation

  1. To implement Adam optimization algorithm, we need to initialize:

Vdw = 0, Sdw = 0, Vdb = 0, Sdb = 0

2. Then on iteration t:

Compute the derivatives dw, db using current mini-batch gradient descent

3. And do the momentum exponentially weighted average. So

VdW = ß1 * VdW + (1- ß1) * dW

Vdb = ß1 * Vdb + (1 — ß1) * db

4. And do the RMSprop update as well. So,

SdW = ß2 * SdW + (1- ß2) * dW2

Sdb = ß2 * Sdb + (1 — ß2) * db2

5.We need to implement bias correction in typical Adam’s implementation. So, we’ll have Vcorrected (where Vcorrected means after correction of the bias).

VdWcorrected = VdW / (1- ß1t)

Vdbcorrected = Vdb / (1- ß1t)

6. And then similarly, we implement this bias correction on S as well.

SdWcorrected = SdW / (1- ß2t)

Sdbcorrected = Sdb / (1 — ß2t)

7. finally, we need to perform the update.

W = W — learning rate * (VdWcorrected / sqrt(SdWcorrected+ ε))

b = b — learning rate * (Vdbcorrected / sqrt(Sdbcorrected+ ε))

where:

  • psilon ‘ε’ is a very small number to avoid dividing by zero (epsilon = 10–8­).
  • ß1 and ß2 are hyperparameters which control the two weighted averages. In practice we use ß1 = 0.9 and ß2 = 0.999 as the default values.
  • Alpha is the learning rate and a range of values to be tested to see what works best for different problems.

References:

Optimization Algorithms for Deep Neural Networks

These are some algorithms that can be used for improving your deep neural networks.

Optimization Algorithms for Deep Neural Networks

In this article, I’ll present you with the most sophisticated Deep Learning optimization algorithms that allow neural networks to learn more quickly and achieve better results.

Bibek Shah Shankhar

Written by

I post articles on Data Science | Machine Learning | Deep Learning . Connect with me on Linkedln: https://www.linkedin.com/in/bibek-shah-shankhar/

Optimization Algorithms for Deep Neural Networks

In this article, I’ll present you with the most sophisticated Deep Learning optimization algorithms that allow neural networks to learn more quickly and achieve better results.