Linear Regression with Practical Implementation

Published in

The Art of Data Scicne

6 min readJun 15, 2018

In this chapter, we will discuss Linear Regression which is completely about Regression Techniques and comes under the Supervised Machine Learning algorithm.

This chapter spans three parts:

What is a Linear Regression?
How does Linear Regression Work?
Practical Implementation of Linear Regression in Scikit Learn?

1. What is a Linear Regression?

Simple linear regression is a type of regression analysis where the number of independent variables is one and there is a linear relationship between the independent(x) and dependent(y) variable.

The Blue line in the above graph is referred to as the best fit straight line. Based on the given data points, we try to plot a line that models the points the best. The line can be modeled based on the linear equation shown below.

The objective of the linear regression algorithm is to get the best values for m and b.

2. How does Linear Regression Work?

This dataset has 6 instances and two numbers of attributes which has X and Y. Here, X is an independent variable and Y is a dependent variable.

Solution:

Linear Regression is a straight line equation which is:

y = predicted output

x = input

m = Slope or Gradient (how steep the line is)

b = the Y Intercept (where the line crosses the Y-axis)

So to find m and b first we find

Slope or Gradient(m):

Y Intercept(b):

putting the value of m & b in equation 1 and we get

Finally, we trained our dataset using the straight-line linear model if we want to predict the y then simply putting the value of x then we obtained predicted output.

For instance, if we want at x= 200 what is predicted output, then simple, merely setting the x value in equation (4)

So predicted output at x=200 is 142.18.

2.1 Correlation:

Correlations are useful because if you can find out what relationship variables have, you can make predictions about future behavior. Recognizing what the future holds is imps like government and healthcare. Businesses as well use these statistics for budgets and business designs.

And then a 52% correlation between x and y.

finally

y = 0.38x+65.142

R = 52%

Note: If you want this article check out my academia.edu profile.

3. Practical Implementation of Linear Regression in Scikit Learn.

Dataset Description:

In this dataset, we have two attributes that have years of experience (X) and salary (Y) respectively and30 instances. Years of experience are independent attribute and salary is the dependent attribute.

Part 1: Data Preprocessing:

1.1 import the Libraries

In this step, we import three Libraries in Data Preprocessing part. A library is a tool that you can use to make a specific job. First of all, we import the numpy library used for multidimensional array then import the pandas library used to import the dataset and in last we import matplotlib library used for plotting the graph.

1.2 import the dataset

In this step, we import the dataset to do that we use the pandas library. After import our dataset we define our dependent and independent variable. Our independent variables are Year of Experience attribute as you can see in sample dataset which we call ‘X’ and Salary is a dependent attribute which we call ‘y’ here.

1.3 Split the dataset for test and train

In this step, we split our dataset into a test set and train set and a 67% dataset split for training and the remaining 33% for tests.

Part 2: Building the Linear Regression model:

In this part, we build our model using Scikit Learn Library.

2.1 import the Libraries

In this step, we are building our linear regression model to do this first we import a Linear Regression model from Scikit Learn Library.

2.2 Initialize our Linear Regression model

In this step, we Initialize our linear regression model.

2.3 Fitting the Linear Regression Model

In this step we fit the training data into our model X_train, y_train is our training data.

Part 3: Making the Prediction and Visualizing the result:

In this Part, we make a prediction of our test set dataset and visualizing the result using the matplotlib library.

3.1 Predict the test set Result

In this step, we predict our test set result

3.2 Visualize our Test Set Result

In the visualizing step first, we scatter our test dataset basically, the red spot is our test dataset and the blue line is prediction test datasets result.

If you want dataset and code you also check my Github Profile.

End Notes:

If you liked this article, be sure to click ❤ below to recommend it and if you have any questions, leave a comment and I will do my best to answer.

For being more aware of the world of machine learning, follow me. It’s the best way to find out when I write more articles like this.

You can also follow me on Github for code & dataset follow on Aacademia.edu for this article, Twitter and Email me directly or find me on LinkedIn. I’d love to hear from you.

That’s all folks, Have a nice day :)