Linear Regression Types and Implementation!

Venkatesha Prasad S
Analytics Vidhya
Published in
5 min readJun 11, 2020

--

In this article, we will see about the types of Linear Regression and how to implement it using scikitlearn library. We will also see about the special type of Linear Regression known as Polynomial Regression.

Linear Regression is the first model anyone would implement when they start to learn about Machine Learning. Linear Regression is a very useful model. It has a lot good features and various things can be achieved by implementing it.

When is Linear Regression used? What are the different types of Linear Regression?

Linear Regression is a supervised model and is used for regression(prediction of continues values). It cannot be used for Classification problems. Linear regression is a simple model and can be understood and implemented very easily. Linear regression is a very simple method but has proven to be very useful for a large number of situations.

Different types of Linear Regression:

  • Simple Linear Regression
  • Multiple Linear Regression
  • Polynomial Regression

SIMPLE LINEAR REGRESSION:

Simple Linear Regression PC: Wikipedia

In Simple Linear Regression, the relation between two continues variables is obtained. One variable is a dependent variable and other variable is an independent variable. The dependent variable is dependent on the independent variable. The dependent variable is expressed as ‘y’ and the independent variable is denoted by ‘x’. A Linear Regression line is a straight line.

Simple Linear Regression is denoted by:

y = b0 + b1*x

Here, y is the dependent variable. x is the independent variable. b1 is the coefficient of the independent variable. b0 is the intercept

If the Coefficient is negative , then the relationship between Independent variable and dependent variable is inversely proportional or Negatively correlated (i.e) If one increases , other one decreases and vice-versa.

Implementation of Simple Linear Regression :

#Importing the Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#Importing the Data
data = pd.read_csv('Salary_Data.csv')
#Separating the target and data values
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values
#Splitting the Dataset
from sklearn.model_selection import train_test_split
X_train , X_test , y_train , y_test = train_test_split(X, y, test_size = 0.2 , random_state = 3)
#Training the data
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train,y_train)
#Predicting the values
y_pred = regressor.predict(X_test)
#Checking the efficiency of the model
from sklearn import metrics
print(metrics.mean_squared_error(y_test, y_pred))
print(metrics.r2_score(y_test , y_pred))
print(metrics.mean_absolute_error(y_test , y_pred))
print(np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
#Getting the Coefficient and intercept of the Regression Line
print(regressor.coef_)
print(regressor.intercept_)

Note: The implementation steps may vary depending upon the type of data in the dataset

MULTIPLE LINEAR REGRESSION:

In Multiple Linear Regression, there are more than one independent variable. The Multiple Linear Regression is represented as:

y = b0 + b1*x1 + b2*x2 + ….. + bn*xn

  • y — Dependent variable
  • x1, x2, …. , xn — Indepedent Variables
  • b0, b1, ….. , bn — Coefficients
Multiple Linear Regression

The relationship between the independent and dependent variable can be both positive as well as negative. For Ex : x1, x2, x3 can have positive relation whereas other independent variables may have negative relation.

Implementation Of Multiple Linear Regression:

#Importing the Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#Importing the Data
data = pd.read_csv('Startups.csv')
#Separating the target and data values
X = data.iloc[:, :-1].values
y = data.iloc[:, -1].values
#Encoding the Categorical Data
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
ct = ColumnTransformer(transformers = [('encoder', OneHotEncoder(), [3])], remainder='passthrough')
X = np.array(ct.fit_transform(X))
#Splitting the Dataset
from sklearn.model_selection import train_test_split
X_train , X_test , y_train , y_test = train_test_split(X, y, test_size = 0.2 , random_state = 3)
#Training the data
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train,y_train)
#Predicting the values
y_pred = regressor.predict(X_test)
#Printing the predicted and actual valuesprint(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))
#Checking the efficiency of the model
from sklearn import metrics
print(metrics.mean_squared_error(y_test, y_pred))
print(metrics.r2_score(y_test , y_pred))
print(metrics.mean_absolute_error(y_test , y_pred))
print(np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

In the Implementation part, there is a additional step of Encoding involved. It is done because there is a column in the dataset which contains categorical variables which is encoded using OneHotEncoding technique.Otherwise, the implementation of Simple Linear Regression and Multiple Linear Regression is same!

POLYNOMIAL REGRESSION:

Polynomial Regression is a special type of Linear Regression. The relationship between Dependent and Independent variables is not Linear.

It is denoted by :

y = b0 + b1*x1 + b2* x1² + ⋯ +bn* xn

Application of Polynomial regression include Estimating the affected/death rate during a Pandemic or Epidemic, predicting the score of a cricket match etc.. It is also used extensively in the field of Chemicals.

Why Polynomial Regression is called Polynomial ‘Linear’ regression?

Polynomial Regression is called Polynomial Linear Regression because the coefficients of the independent variable is still ‘linear’. Only the Independent variables are not linear. By building a model, we try to find the values of a coefficient terms such as b1, b2,….,bn but we do not try to find the values of the independent variable itself. That’s why it Polynomial regression is still called as Polynomial Linear Regression.

Implementation of Polynomial regression :

#Importing the Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#Importing the Data
data = pd.read_csv('Position_Salaries.csv')
#Separating the target and data values
X = data.iloc[:, 1:-1].values
y = data.iloc[:, -1].values
#Training the data
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import LinearRegression
poly_reg = PolynomialFeatures(degree = 2)
X_poly = poly_reg.fit_transform(X)
lin_reg_2 = LinearRegression()
lin_reg_2.fit(X_poly, y)
#Visualizing the Polynomial Regression Plot
plt.scatter(X, y, color = 'red')
plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), color = 'blue')
plt.show()
# Predicting a new result with Polynomial Regression
lin_reg_2.predict(poly_reg.fit_transform([[6.5]]))
#Getting the Coefficient and intercept of the Regression Line
print(lin_reg_2.coef_)
print(lin_reg_2.intercept_)

NOTE: Changing the degree to a higher number may result in over-fitting the model.

Example of Over-fitting using Polynomial Regression

Hooray…That’s the end!..We have learnt about different types of Linear Regression and how to implement it using sklearn in this article!

Thank you for reading this article. If you enjoyed this article , please leave some claps to show your appreciation. Follow me for more articles like this. If you have any doubts/queries or feedback regarding this article, feel free to reach out to me in the comments section. Have a great day :)!

--

--