Linear Regression implementation from scratch using Python.

Sumaya Bai
Analytics Vidhya
Published in
4 min readAug 25, 2021

Scikit-Learn library is a pure bliss when it comes to implementing any algorithm to train a machine learning model. It is a very advanced library, our work can be cut short into few lines of code.
Having said that, i always believe that knowing any algorithms from scratch and understanding the math and statistics behind the working of the algorithm is utmost important.

In this post, I’ll be showing you how to implement a simple linear regression algorithm from scratch using python.

So why wait, come on and be ready to get your hands dirty!

The first step for any Machine Learning model is to collect the data.
I’ll be using a simple dataset, the salary dataset. The dataset can be found here.

So let’s begin by importing all the necessary libraries

Next thing is to read and load our dataset and have basic information about the dataset.

So from the above images, we get a base idea about our dataset. It has 30 rows and 2 features which are Years of Experience and salary.
Our aim is to find the what will be the salary of a person if he has 7years of experience.

For any Machine learning problem, our first step is to select a suitable algorithm. To test which algorithm to go with, we have to plot the data and see the relation between the variables.

I’m using a scatterplot to check the relation between both the features.

From the above image, we get an intuition that these variables have a linear relation between them. So I’ve decided to go with the Simple Linear Regression algorithm to train the model.

What we basically do in a Linear regression model, is to try and find the best fit line to fit the features in the data with minimal error.

When we are done deciding on the type of algorithm, Now we’ll go ahead and assign Dependent and Independent Variables.

I’m using X and Y to denote my variables. Years of Experience is an independent variable and Salary is dependent on it because salary will increase or decrease with increase or decrease in years of experience.

The hypothesis of Linear regression:
Let’s define our hypothesis function for a straight line. as well know the equation for a straight line is :

m is the coefficient of the variable x.

b is the intercept.

In our dataset used, we have only one feature so I’ll be going ahead use the same hypothesis function.

The formula to calculate the coefficient and intercept is as follow :
m = x-(mean of x)*y-(mean of y) / (x-(mean of x)²
c = (mean of y) — m * (mean of x) / n

Now, applying the above formula we have the values for m and c

Now in mathematical terms :

Let’s Plot our above information into a graph and draw the best fit line.

Yippee, so we are done finding the best fit line and fitting it into the data point.

So now if I want to predict the salary of a person with 7 years.

Linear Regression is one of the first and basic algorithms used in the machine learning model. Believe it or not, this all there to code a linear regression from scratch.
Happy Coding :)

--

--

Sumaya Bai
Analytics Vidhya

Data enthusiasts, turning numbers into powerful stories. Let’s dive into the data world together!