Linear Regression Using Scikit Learn in Python

Anjali Pal
Analytics Vidhya
Published in
3 min readSep 19, 2020

Before starting with Linear regression, let’s discuss the types of algorithms. There are mainly three types of algorithms namely: Supervised, Unsupervised and Reinforcement.

Supervised Machine Learning means that the algorithm is first trained on the labelled dataset (train data) to form a model. After this, the machine is provided with a new set of data (test data) to the acquired knowledge to predict the outcomes. For instance, when all of us were babies, our parents told us what is a cow, rabbit, snake, squirrel etc.. (Trained us). After that, whenever we saw an animal (test data), we were able to classify it depending on our knowledge.

Supervised ML algorithms are of 2 types:

1. Classification: In this output variable is a category (Categorical target variable)

2. Regression: In this output variable is a value (Continuous target variable)

Regression consists of various types like Linear, Logistic, Ridge, Lasso, Polynomial etc..

For this article, we will be talking about Linear Regression.

Linear Regression is the simplest form of regression which assumes that the independent and dependant variables are linearly related. The mathematical model looks like:

If there is more than one independent variable then it’s called Multiple Linear regression and is represented as:

Statistics will be used to determine the coefficients and form a regression equation.

Now, let’s move to the coding part that is making the algorithm.

I did a project as part of my internship. I’ll share the code and explain the algorithm with it.

The initial steps include importing necessary libraries and reading data.

After this, we must check the assumptions. For linear regression, dependant and independent variable must be linearly related. So, to check this we make a scatter plot. [Since, there is only one variable, no need to check other assumptions of linear regression].

Once we are sure which algorithm to use, we move onto making data ready for splitting it into training and testing sets.

After splitting, first, we train the data.

After training, we test the model thus formed on test data. From the predictions and actual values of test data, we evaluate the model to check how good it is doing. The metrics for evaluation change with data. Whenever, there is regression involved we always use R squared valued to check how is our regression model and mean absolute error to know how much, on average, can our predictions deviate from actual values.

Always remember to evaluate the model.

This was an example of the most simple algorithm . With my next articles , I would be explaining some more complex algorithms.

Link to Linear Regression algorithm: Github

If you have suggestions on articles or want me to do some edits . Please feel free to contact me through LinkedIn or you can put a comment here.

--

--

Anjali Pal
Analytics Vidhya

A data science enthusiast who believes that “It is a capital mistake to theorize before one has data”- Sherlock Holmes. Visit me at https://anjali001.github.io