Gradient Descent Part 1: The Intuition

Harshwardhan Jadhav
Analytics Vidhya
Published in
4 min readJan 11, 2021

What is Gradient Descent in Machine Learning?

Every person who learns/works in the field of Machine Learning comes across one algorithm called Gradient Descent, and we have to admit that Gradient Descent has made the life of algorithms simpler. This article is all about Gradient Descent, here we will see what is exactly Gradient Descent is and How we can use it for simplifying the work of an ML algorithm to get the best results out of it.

Let’s get started,

So as we all know Wikipedia holds most of the information anyone wants to read, and thus Wikipedia has its own definition of the Gradient Descent. So let’s see what Wikipedia says about it,

Gradient Descent is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. To find a local minimum of a function using gradient descent, we take steps proportional to the negative of the gradient (or approximate gradient) of the function at the current point. But if we instead take steps proportional to the positive of the gradient, we approach a local maximum of that function; the procedure is then known as gradient ascent.

Very simple right, I just love Wikipedia due to its simple definitions. Now let’s start with a simple example of what Wikipedia is trying to convey to us from the above definition,

Hill of Potatoes
Image Courtesy: https://www.potatonewstoday.com/2021/01/04/webinar-global-climate-change-and-potato-storages/

You must be familiar with the above hill of potatoes, now imagine you are in the market to buy potatoes and you came across this above potato store. Now you start to choose good potatoes from this hill of potatoes, as everyone wants the best potatoes to cook you also want the same. Suppose you started picking potatoes from the top of the hill and then kept moving down, down, and down until you found out that the best quality potatoes are at the bottom of this potato hill. So you stopped at that place and happily picked some best quality potatoes and bought them.

What you did exactly in the above scenario? You found the best (optimum) place on that big potato hill where the best quality potatoes are, right?

Similarly,

The Gradient Descent algorithm helps us to find out the best (optimum) value of ‘x’ where we get the minimum value of a function f(x). I hope you must have got an idea now what we are going to do exactly using Gradient Descent, if not then keep reading I am revealing the Secrets of Gradient Descent slowly.

This was an intuitive understanding of the Gradient Descent algorithm but as this is used in machine learning, we must know the mathematical formulation of the same so as to understand the practical use of it.

Before diving into the mathematics of Gradient Descent, we need to know the following concepts:

A. What is the slope?

The Slope (also called Gradient) of a straight line shows how steep a straight line is.

To calculate the Slope:

Divide the change in height by the change in the horizontal distance:

https://www.mathsisfun.com/geometry/slope.html

In another way, the Slope is denoted by ‘df(x)/dx’, where f(x) can be any differentiable function, and this notation generally is used while working with the slope in terms of Machine Learning.

The value of Slope can be negative or positive, look at the following picture,

https://www.mathsisfun.com/geometry/slope.html

B. What is Maxima and Minima?

These concepts are basically linked with the value of slope that we discussed above.

https://www.mathsisfun.com/calculus/maxima-minima.html

In a smoothly changing function, a maximum or minimum is always where the function flattens out (except for a saddle point).

Where does it flatten out? Where the slope is zero.

Where is the slope zero? The Derivative tells us that at what value of ‘x’ we are getting ‘df(x)/dx’ i.e. slope is zero.

In simple words,

  • A high point is called a maximum (plural maxima).
  • A low point is called a minimum (plural minima).

Now you are good to go for mathematical understanding of Gradient Descent which I will explain in the next part of this blog. I hope you enjoyed reading this blog, share it if you like, and also give a clap. Thanks for making it to the end, I will see you in the next part of the blog soon.

For part 2: Click here

References:

--

--

Analytics Vidhya
Analytics Vidhya

Published in Analytics Vidhya

Analytics Vidhya is a community of Generative AI and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Harshwardhan Jadhav
Harshwardhan Jadhav

Written by Harshwardhan Jadhav

Data Scientist | Mechanical Engineer

No responses yet