Implementing Linear Regression From Scratch
Regression Analysis is a form of predictive modeling technique which investigates the relationship between a dependent and independent variable.
Let us look into the simple linear regression (only one dependent and one independent variable) in this article.
In the equation of a line y=mx+c, ‘y’ is a dependent variable, ‘x’ is an independent variable, ‘m’ is a slope, ‘c’ is the y-intercept.
Read more about slope-intercept form at khan academy
In Linear Regression, we try to find a line that best fits the given data.
The best-fit line means the line which has a minimum sum of squared errors(SSE) or mean sum of squared error(MSE). Here I am using SSE.
The mathematical formulation for linear regression is as follows:
It means, find the values ‘m’and ‘c’ for which the sum of squared errors becomes minimum. This is an optimization problem.
On solving the above-posed optimization problem, we get ‘m’ and ‘c’ values as follows:
Now let’s implement this in python
Initially, I have imported the required modules
I have considered the sample data set and stored in the pandas data frame as follows:
Now, let’s plot the above data
Now, I will create a class for linear regression
Now, let’s build a model with our dtrain data set
Now, let’s find the predicted values
Now, let’s check the goodness of fit using R-Square
R-squared value is a statistical measure of how close the data are to the fitted regression line.
R-Squared value is ~ 0.3, which is not a good fit.
Now let’s compare our model with the sklearn model.
The results are exactly the same as our model results.
See full code in ipython notebook from the GitHub link.
Also if you want to learn more in-depth about Linear Regression and how it works, there’s a great series by Datum Guy which you can refer to here.
Thanks for reading :) cheers!
Learnt something useful from the article? Kindly show your appreciation by giving a clap and sharing it with your friends. :)