Deep Learning

Want your model to converge faster? Use RMSProp!

This is used to speed up Gradient Descent.

Danyal Jamil
Nov 2, 2020 · 5 min read

Stuck behind the paywall? Click here to read the full story with my friend link!

This article is actually a continual of previous series of articles. Here are links to the stories, if you will to come along.

Now, who doesn’t like Deep Learning? I suppose you do or why else would you be reading this article. But Deep Learning is just starting out these days, there ALOT that has to be discovered in the field.

Despite the rapid advancements and studies being done in the field, there’s is still a ton of stuff that we need to unveil. GPT-3 is an example of what the future could look like!

Now, we generally use Deep Learning when we deal with images because, in my opinion, that’s primarily where, Neural Nets and specifically Convolutional Neural Nets shine. And there is data everywhere today. Think of a project right now.

..

..

..

‘Credit Card Fraud Detection’? ~ search on google and find several datasets and/or models already up and running.

‘Car detection’? ~ already up and running.

‘Face Recognition’? ~ same.

You see? people are doing whatever it takes to search and come up with new and improved algorithms to solve problems. There is constant research being done in the field and still, Machine Learning will have exponential demand in near future.

RMSProp (Root Mean Square Prop)

RMSProp is another famous optimizer.

Root Mean Square Prop or RMSprop is using the same concept of the exponentially weighted average of the gradients like gradient descent with momentum but the difference is in the updating of parameters.

Image for post
Image for post
Image taken from here

Now, the center red dot is the global minimum and is basically what the Algorithm/Machine Learning model is trying to reach. It takes these steps which get it closer to the minimum region.

The steps are represented by the blue lines here. We can see that the steps are oscillating. With each step, the model reaches closer to the minimum region.

Now, larger steps are acceptable at the start but as we progress, we need smaller and smaller steps in order to reach the center region and staying there!

Hence, we need to decrease the magnitude of the steps we take with time, or else the model will surpass the needed region and will perform poorly.

“What causes these oscillations?” ~ you might ask. Remember we add a Bias term to our “Wx + B” equation, the oscillations are due to the bias term. The movement and direction of the movement is determined by the Weights.

If we slow down the update for bias then we can damp out the vertical oscillations and if we update weights with higher values then we can still move fast towards the minimum point.

Implementation

We know that the normal back-propagation process that we take is:

W = W – learning_rate * ∂Wb = b – learning_rate * ∂b

But in RMSProp, instead of taking ∂W and ∂b, we use the idea of exponentially weighted averages and find S∂w and S∂b first:

S∂w = ß * S∂w + (1- ß) * (∂W)^2S∂b = ß * S∂b + (1- ß) * (∂b)^2

here, ‘ß’ is another hyper-parameter which takes in values [0, 1], it basically controls the weighted averages value. And now, to update the original W and b:

W = w - learning_rate * (∂w/sqrt(S∂w + ∑))B = b - learning_rate * (∂b/sqrt(S∂b + ∑))

Where the squaring is element wise and ‘∑’ is Epsilon, where:

∑ = 10^-8.

Why are we adding ∑?” ~you might be asking. Well, suppose the square root of S∂w or S∂b comes out to be 0. Then, if we divide ∂w or ∂b from 0, we’ll have infinity, which is not what we want. Hence, to avoid such mistakes, we have the ∑, which is just there to make sure that the division is never carried out by 0.

Image for post
Image for post
Image by Author

Blue line is normal GD, & Green line is with RMSProp. We can see that its easier to reach the minimum region using this.

Visualization

Image for post
Image for post
Taken from here

What we can observe from this graph is that the RMSProp, the black line, goes straight down, it doesn’t really matter how small the gradients are, RMSprop scales the learning rate so the algorithms go through the minimum region faster than most.

Conclusion

RMSProp is a very powerful and famous optimizer. It is so popular that only Adam optimizer has surpassed, making it one of the most used Optimizer algorithms in the era of Deep Learning.

Contacts

If you want to keep updated with my latest articles and projects follow me on Medium. These are some of my contacts details:

Happy Learning. :)

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Danyal Jamil

Written by

Machine Learning Enthusiast | Quick Learner | Student

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Danyal Jamil

Written by

Machine Learning Enthusiast | Quick Learner | Student

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store