Gradient Descent : Back-bone of most of the popular Machine-Learning Algorithms

Nikhil Chigali
Jun 18, 2018 · 5 min read
Here, the Blue line is our Linear Regression model, The red points are the training data, and the Green lines are the error in our model’s predictions for each training sample.



Gradient Descent Visualized on a 3D graph
Gradient Descent converging at the global minima
Contour plot

Points to note:

1. Data in real life scenarios may not be so simple that we can fit a linear model to the data, sometimes we may have to fit non-linear model to the data .

2. For fitting a non linear model, we usually prefer polynomial regression instead of linear regression.

3. Running Batch gradient descent on huge data sets is a bit inefficient and computationally slow. That is why there are variations of it like Stochastic Gradient descent, mini-batch gradient descent, etc., that do a pretty good job in optimizing for large data sets.


Nikhil Chigali

Written by

Machine Learning and Deep Learning Enthusiast. Studying at V.R. Siddhartha Engg College. #AlwaysFeedYourCuriosity

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade