Linear Regression Using Scikit-Learn Library

Jose Dominguez
Digital Studio Stream
2 min readSep 29, 2020
Photo by Charles Deluvio on Unsplash

Linear Regression is the simplest and the first machine learning algorithm one would study. In this post, I further explain what linear regression is and will demonstrate how to implement a model in Python using the Scikit-learn library.

What Is Linear Regression?

Linear regression is a linear approach to determine the relationship between two data points, one independent and one dependent variable. Linear models are simple and often provide a sufficient amount of detail about how the inputs affect the output. For prediction purposes, they can sometimes outperform fancier nonlinear models, especially in situations with small numbers of training cases or data that is spread out about the average.

Importing Libraries

#importing libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

Read the dataset

df = pd.read_csv("data.csv")
df.head()

Implementing Linear Regression Model

Y = mX + b

We will use this equation to train our model with the given dataset and predict the value of Y for any given value of X. We first will have to determine the slope and intercept values. Scikit-learn allows us to get easily to create a linear regression model and determine those values by doing the following:

#separating the input features and output from dataset
X_data = df.TV.values.reshape(-1,1)
Y_data = df.sales
#fitting the model
model = LinearRegression().fit(X_data, Y_data)
#getting the coefficient and slope values
slope = model.coef_
intercept = model.intercept_
print("Coefficient =",slope,"\nSlope =",intercept)
Coefficient = [0.04753664]
Slope = 7.032593549127695

Plotting

#linear relationship
Y_pred = slope*X_data + intercept
#plotting the actual values
plt.scatter(X_data,Y_data, color="blue")
#plotting predicted values
plt.plot(X_data,Y_pred, color="red")
#set labels for axes
plt.xlabel('TV')
plt.ylabel('Sales')

Linear regression can be used to make simple predictions. It won’t be best for every dataset because it assumes that the inputs are linear, but it can give you an insight into the data you are handling.

--

--

Jose Dominguez
Digital Studio Stream

Currently a student at Rutgers University — Newark, studying Applied Physics & Computer Science with a concentration in Mathematics. | Website: www.josedom.net