Image for post
Image for post

Overview of different Optimizers for neural networks

Renu Khandelwal
Feb 3, 2019 · 6 min read

The objective of Machine Learning algorithm

Image for post
Image for post
Gradient based learning

Gradient Descent

Image for post
Image for post
θ is the weight parameter, η is the learning rate and ∇J(θ;x,y) is the gradient of weight parameter θ

Types of Gradient Descent

Role of an optimizer

Types of Optimizers

Image for post
Image for post
Source: Genevieve B. Orr
Image for post
Image for post
Momentum Gradient descent takes gradient of previous time steps into consideration
Image for post
Image for post
source:http://cs231n.github.io/neural-networks-3/
Image for post
Image for post
Nesterov Accelerated Gradient
Image for post
Image for post
Adagrad
Image for post
Image for post
Image for post
Image for post
γ is the decay term that takes value from 0 to 1. gt is moving average of squared gradients
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Image for post
Source Source and full animations: Alec Radford

Data Driven Investor

from confusion to clarity not insanity

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store