Gradient Descent in 4 minutes:

Published in

AITS Journal

4 min readFeb 7, 2019

Gradient Descent is one of the most important and heavily used optimization algorithm in the field of machine learning. Whether you talk about Linear Regression, Gradient Boost, or Neural Network, it is fundamental element in the working of every machine learning algorithm.

In this article we will get a very brief but detailed explanation about gradient descent and its internal working.

Table of Content :

What is gradient descent
How does it work
Its variants.

What is Gradient Descent ?

Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or approximate gradient) of the function at the current point.Confused ??

Basically,Gradient Descent is a machine learning optimization algorithm.Now the question arise,what is optimization and why do we need this in machine learning?. Well this is a sort of technique employed to reduce the cost function,which is the summation of the difference between predicted and the estimated result,getting the final weight and final bias as return value which best fit the hypothesis.

Now,let us understand with the help of a diagram(gif).

Image result for gradient descent gif — Source : Giphy

So what we do is we pick up any random value of the weights to fit the model,having trained the model using those weights and bias value,we calculate the cost function using :

Image result for mean squared cost function images in ml — Source : Freecodecamp

Now what we do is we differentiate the above function with respect to the used random values of weights and biased,in simple term what we are doing is following the slope in certain direction and reaching at some point we update the values of the weight and biased.There is one parameter we make use of here,so called learning rate,whose value we choose such that the step taken is small and missing of local minima(lowest point in the curve) is minimized.We repeat the process until we reach to the lowest value of cost function.See the diagram below to get more understanding.

Want to see the effect of the learning rate,have a look at the diagram down below :

See the formula below,where we theta1 denotes weight and theta0 bias.The alpha denote learning rate. https://vimeo.com/141574374

The final value of weight bias is used with the final model for prediction,which will be replaced with the previous values in the hypothesis.

Now,there are different variants of Gradient Descent algorithm,with same concept :

Batch Gradient Descent
Stochastic Gradient Descent

In batch Gradient Descent we update the weights and bias value after iterating over the data of the same size as of batch value.Batch gradient descent is the most common form of gradient descent described in machine learning.while
In Stochastic Gradient Descent the weights and biases are updated after each iterations which reduces the problem of missing global minima. The learning can be much faster with stochastic gradient descent for very large training data sets and often you only need a small number of passes through the data set to reach a good or good enough set of coefficients, e.g. 1-to-10 passes through the data set.

Want to go to the code part,do visit my git hub page :https://github.com/shobhitsrivastava-ds/Gradient_descent-from-scratch

Summary :

In this post you have got an idea about what is gradient descent,how does it work,what are the formulas involved and about its variant.Before exit,please watch this video for better understanding:

Source: Artificial Intelligence — All in One

Read this post :https://machinelearningmastery.com/gradient-descent-for-machine-learning/

Follow me for more article as such.

Gradient Descent in 4 minutes:

Table of Content :

What is Gradient Descent ?

Written by Shobhit Srivastava