Linear Regression Part II — using Gradient Descent Algorithm

Nandini Sekar
3 min readAug 21, 2020

--

Gradient Descent approach

Introduction:

In this article we will learn about Linear Regression using an algorithm called Gradient Descent[GD]. Please check the below link to get an overview of Linear Regression using SSEhttps://medium.com/@nandinisekar27/linear-regression-part-i-cc1a19a3591e

Now lets get into our topic, the word Gradient means “slope” and Descent means an “act of moving downwards”. GD performs multiple iterations to find the point at which is slope is 0 or minimal. Hence it can be framed as a “first order iterative optimization technique to find the minimum of a function”.

Predicted Y value

Where Y hat[predicted Y] is replaced in the function above

Thereby if we substitute different values of “m” and “c” each time we will arrive at different set of functions, all those when plotted results in a “convex” shaped graph and the lowest point of that curve is what we need to find.

Flow of Gradient Descent Convex Graph

The small blue circles in the above figure represents the learning rate or the speed at which the point moves from one position to another to find the global minima. There may be more than one minima in a graph, all the minima’s are referred as “Local Minima” the point where the slope is 0 or close to 0 is referred as “Global Minima”

Steps Involved:

  1. Calculate y actual and y predicted values.
  2. Calculate their difference and square them up.(Refer little above for the function)
  3. Substitute values for slope-“m” and intercept-“c” in the function, for every value in x the Mean of Squares is calculated:

4. Initially we can take m and c values as 0. Learning Rate-0.001(blue circles in figure above) this is handled depending on how the value of m /slope is varying at each point.

5. Learning rate should be minimum for good accuracy.

6. We need to substitute the values of “m” and “c” in the partial derivative of the function. Once with respect to “m” and next with respect to “c” the partial derivatives are found, for the above function.

7. Updating the values of m and c for each iteration, this process is continued until minimum value of the function(0 slope value) is found, so that the error is 0% and accuracy is 100%

Conclusion:

With this we have come to an end of this article. We learnt about how “gradient descent” helps in finding the global minimum point by taking iterative approach in linear regression model. GD algorithm is not only for linear regression, used in various other algorithms as well.

Disadvantages of Linear Regression:

  1. If the data we are going to operate on is non linear, then it will be inappropriate to implement linear regression algorithm.

2. This is prone to Bias trade off and Variance problem.

Please wait for my next post which is on Error metrics, Bias and Variance problems!

Happy Learning! :)

--

--