Linear Regression – A Step Towards Predictive Analytics

Darshana Singh
The Startup
Published in
4 min readApr 20, 2020

Linear Regression, a poster child of predictive modeling, helps a statistician to unwind the black box of machine learning while solidifying the understanding of the applied statistics.

image source: LinkedIn

If you are reading this article, I hope you are familiar with this big universe of Data Science. Just like our universe contains galaxies, stars, and planets as its integral part, this aesthetic universe of Data Science has different algorithms. Today onwards, we will discuss one Data Science algorithm in each post.

As a data science aspirant, it’s very important to understand a few algorithms. This post is entirely dedicated to one of the most popular & well-understood algorithms, Linear Regression. Now before jumping on anything, it is very crucial to understand 3 W’s What? Why? & How?

These three questions will give you an overall picture of the algorithm and help you develop a sense of understanding how it works & how you can best use it for your problem statement.

What is a Linear Regression?

Linear regression falls under the category of supervised learning. The overall idea of this regression is to examine or predict the relationship between two variables or factors. The variable which is being predicted is called a dependent variable. The factors that are used to predict the value of the dependent variable are called the independent variables.

There are two types of Linear Regression:

1. Simple Linear Regression (one dependent variable and one independent variable)

2. Multiple Linear Regression(One dependent variable and multiple independent variables)

Why should we use it?

If our problem statement demands predictive analysis, we should use linear regression. After a quick walkthrough to the dataset will help us to decide whether it fits for Simple Linear Regression or Multiple Linear Regression.

How should we use it?

Let’s try to understand Simple Linear Regression first, there are ample data sets available publicly you can go through them. I have found this brilliant example on a website where we will take a simple case of prediction of prices of houses based on the area. So we have two columns i.e. Area & Price.

Now, this CSV file will act as our training data. Do not forget to upload your CSV file in your Jupyter notebook.

Since we are now implementing ML Algorithms. We are supposed to be well versed in the libraries & Exploratory data analysis of python.

In case you have missed out on these, you can go through my previous article to get a fair sense.

After reading the file, we will analyze the data set. There are multiple ways to analyze the data, this again depends on your data set and how much of data cleaning does it require.

Here, we have a very small dataset hence we will do some basic Exploratory Data Analysis.

This step will help us understand how suitable is the data distribution for Linear Regression.

After getting this heads-up we will now train the dataset. Once your model is trained you are good to go for the prediction. A good statistician is the one who can visualize conceptual knowledge with a real-time problem. Following the lead, we will try to visualize what a linear equation looks like.

As we know the every linear regression fits in the equation Y=mx+b
Where,
m = intercept
b = coefficient constant

Now, let’s see how we can get this in our programming environment

Our model is ready now. Now we can use this model to predict huge datasets. Let’s give a shot.

There are various ways to test & train the data. In Some cases, we divide the existing dataset in different proportions as training & testing data, whereas in the above scenario our training & test data are in different files.

Python also gives us the leverage to export the data into a CSV or excel file.

So, this was a run-through of Simple Linear Regression. This will help you understand the concept as well as to brush up the basics. I hope this will motivate you to explore more of this beautiful world of statistics. We will unfold the other chapter of regression in the next article. Till then stay tuned & EAT, SLEEP & PRACTICE!

P.S. Please feel free to contact in case of any queries. I will be more than happy to help :)

--

--