Intuition behind Gradient Descent

Gautam A
2 min readMar 23, 2017

--

Gradient descent is a mathematical concept that has its applications in many places including Machine Learning. Even if you have just started machine learning career, you should have come across the term “Gradient Descent”.

If you have started learning machine learning, the place you might have come across word “Gradient Descent” is “Linear Regression”. ”Linear Regression” is pretty simple to understand if you understand simple maths behind it.Please note that this post only describes about “Gradient Descent”, not about “Linear Regression”. The meaning of “Gradient Descent” lies in its name itself. “Gradient” means “Slope/Inclination”, “Descent” means “Moving down/fall”, i.e. “Gradient Descent” simply guides us to move in opposite direction of the slope.

Consider the following Diagram which shows a function “y = f(x)” drawn on Cartesian coordinate plane and slope of the graph at x=1.

Graph of y=f(x)=x² and a tangent drawn to it at x=1.

Our Goal: Our goal is to identify the value of “x” such that “f(x)” is minimum.

In the picture shown above, the function f(x) gets it minimum value at x=0, to get minimum value of x such that f(x) gets its minimum value, start from a random x and find the gradient or slope of f(x) at that point. let us say we start at x=1, gradient of f(x) at x=1 is 2, which is positive number, notice in the picture that to get minimum value of f(x) we need to go left of x=1(i.e. x=0), i.e. we need to move in the negative direction in the x-axis, Similarly if we have started at x=-1, slope of f(x) at x=-1 is -2, a negative number, but to get minimum value of f(x) we need to move in positive direction of x-axis. After some iterations of execution, x value converges to 0 and f(x) gets its minimum value.

Its clear that to get minimum value of f(x), where ever we start at, we will move in the opp0site direction of the gradient at that point. That is the reason this method is called Gradient Descent.

To make things easy to understand I’ve given very basic version of Gradient Descent and I did not include limitations of Gradient Descent and remedies we can apply to those limitations.

Similarly Gradient Ascent method works and it is used to maximize the value of a function.

This is the end of this post. Please provide comments to improve the quality of my posts.

--

--