Multiple Linear Regression and Gradient Descent using Python

Gilang Satria Prayoga

Published in

Nerd For Tech

4 min readFeb 26, 2022

This post will explain the Linear Regression with multiple variables and its implementation in Python.

Before we dive deeper in multiple linear regression, take a detour on simple linear regression on this post.

Intro to Machine Learning — Simple Linear Regression and Gradient Descent

Linear Regression is the first step for everyone who wants to study Machine Learning. Or anyone who wants to be a…

medium.com

We’ll use some equations and codes from that post.

Multiple Linear Regression

Recall the equation for simple linear regression

Figure 1. Simple Linear regression

The predicted labels (y hat) is depend only on single feature x. What if we have a datasets with of multiple independent features? If that is the case, then we need the multiple linear regression. The equation is:

Figure 2. Multiple Linear Regression

where:

Yi=the predicted label for the ith sample

Xij=the jth features for the ith-label

W0=the regression intercept or weight

Wj=the jth feature regression weight

Notice that when the labels y depends only on one variable x, the equation become simple linear equation y=w1x + w0.

We can write the multiple linear regression in matrix notation

Figure 3. Matrix notation for the Multiple Linear Regression

Gradient Descent

The cost function to be minimized in multiple linear regression is the Mean Squared Error :

Figure 4.cost function and its partial derivative

in matrix form, the partial derivate of the cost function can be written as

Figure. 6 Matrix notation for cost function derivative

The updated weights on k+1 iteration become

where alpha is the learning rate.

Implementation

Recall the model we wrote for the Simple Linear Regression

We’re going to update the code, so it will also work for Multiple Linear Regression cases.

Update the constructor

In the constructor, we just added the initialize weights matrix as the model property.

Update the fit method by following these steps:

modify the features matrix by adding 1 column with its value equal to 1 as the intercept (w0)
Initiate the value of weights matrix to zero.
For iteration 1 until n (the class iters property)
Compute the prediction labels matrix using figure 3.
Compute the error between the known labels matrix and the predicted labels on step 3.
Compute the partial derivate of cost function using figure. 6

7. Update the weight matrix using the equation in figure 7.

Finally for the predict method:

Lets recap our updated model:

Before testing the model on multiple variable datasets, let’s test the model for the simple linear regression case, we will re-use the test score dataset from the previous post

The output of MSE is around 3.30, which is same as the previous model.

Lets test this model for multivariable regression. We’re going to use the housing price datasets from Kaggle. Download and save into datasets folder.

We’re going to predict SalePrice , lets explore the datasets a little bit

df["SalePrice"].describe()count      1460.000000
mean     180921.195890
std       79442.502883
min       34900.000000
25%      129975.000000
50%      163000.000000
75%      214000.000000
max      755000.000000
Name: SalePrice, dtype: float64

Lets see the distribution graph of the SalePrice

Most of the prices lies between 100K-200K, but its appear to be a lot of outliers on the more expensive side.

As for now, we don’t want to get any missing value in our datasets. So, lets check out which columns are containing the missing value

PoolQC          1453
MiscFeature     1406
Alley           1369
Fence           1179
FireplaceQu      690
LotFrontage      259
GarageYrBlt       81
GarageType        81
GarageFinish      81
GarageQual        81
GarageCond        81
BsmtFinType2      38
BsmtExposure      38
BsmtFinType1      37
BsmtCond          37
BsmtQual          37
MasVnrArea         8
MasVnrType         8
Electrical         1Columns with missing values: 19

So there are 19 columns with missing value, we will skip them and use another columns for regressions. We’re going to use relatively high correlation columns, this is some of them:

               OverallQual  GrLivArea  GarageArea  SalePrice
OverallQual     1.000000   0.593007    0.562022   0.790982
GrLivArea       0.593007   1.000000    0.468997   0.708624
GarageArea      0.562022   0.468997    1.000000   0.623431
SalePrice       0.790982   0.708624    0.623431   1.000000

Scale the data so it will be converge fast using

Figure 9. Scaler equation

where µ is the means of the features, and σ is the standard deviation

And now we can fit our datasets

Conclusion

In this article, we have learned:

Generalization of simple linear equation to multiple linear equation
Derived the equations of Multiple Linear Regression, both on Algebra and Matrix notation.
Implementing the Gradient Descent on Multiple Linear Regression

Please, share this post and give it a clap if you like it.