Multiple Linear Regression and Gradient Descent using Python

Gilang Satria Prayoga
Nerd For Tech
Published in
4 min readFeb 26, 2022

--

This post will explain the Linear Regression with multiple variables and its implementation in Python.

Photo by Annie Spratt on Unsplash

Before we dive deeper in multiple linear regression, take a detour on simple linear regression on this post.

We’ll use some equations and codes from that post.

Multiple Linear Regression

Recall the equation for simple linear regression

Figure 1. Simple Linear regression

The predicted labels (y hat) is depend only on single feature x. What if we have a datasets with of multiple independent features? If that is the case, then we need the multiple linear regression. The equation is:

Figure 2. Multiple Linear Regression

where:

Yi=the predicted label for the ith sample

Xij=the jth features for the ith-label

W0=the regression intercept or weight

Wj=the jth feature regression weight

Notice that when the labels y depends only on one variable x, the equation become simple linear equation y=w1x + w0.

We can write the multiple linear regression in matrix notation

Figure 3. Matrix notation for the Multiple Linear Regression

Gradient Descent

The cost function to be minimized in multiple linear regression is the Mean Squared Error :

Figure 4.cost function and its partial derivative

in matrix form, the partial derivate of the cost function can be written as

Figure. 6 Matrix notation for cost function derivative

The updated weights on k+1 iteration become

Figure 7. Learning rule for weights

where alpha is the learning rate.

Implementation

Recall the model we wrote for the Simple Linear Regression

We’re going to update the code, so it will also work for Multiple Linear Regression cases.

Update the constructor

In the constructor, we just added the initialize weights matrix as the model property.

Update the fit method by following these steps:

  1. modify the features matrix by adding 1 column with its value equal to 1 as the intercept (w0)
  2. Initiate the value of weights matrix to zero.
  3. For iteration 1 until n (the class iters property)
  4. Compute the prediction labels matrix using figure 3.
  5. Compute the error between the known labels matrix and the predicted labels on step 3.
  6. Compute the partial derivate of cost function using figure. 6

7. Update the weight matrix using the equation in figure 7.

Finally for the predict method:

Lets recap our updated model:

Before testing the model on multiple variable datasets, let’s test the model for the simple linear regression case, we will re-use the test score dataset from the previous post

The output of MSE is around 3.30, which is same as the previous model.

Lets test this model for multivariable regression. We’re going to use the housing price datasets from Kaggle. Download and save into datasets folder.

We’re going to predict SalePrice , lets explore the datasets a little bit

df["SalePrice"].describe()count      1460.000000
mean 180921.195890
std 79442.502883
min 34900.000000
25% 129975.000000
50% 163000.000000
75% 214000.000000
max 755000.000000
Name: SalePrice, dtype: float64

Lets see the distribution graph of the SalePrice

Figure 8. Sales Price histogram

Most of the prices lies between 100K-200K, but its appear to be a lot of outliers on the more expensive side.

As for now, we don’t want to get any missing value in our datasets. So, lets check out which columns are containing the missing value

PoolQC          1453
MiscFeature 1406
Alley 1369
Fence 1179
FireplaceQu 690
LotFrontage 259
GarageYrBlt 81
GarageType 81
GarageFinish 81
GarageQual 81
GarageCond 81
BsmtFinType2 38
BsmtExposure 38
BsmtFinType1 37
BsmtCond 37
BsmtQual 37
MasVnrArea 8
MasVnrType 8
Electrical 1
Columns with missing values: 19

So there are 19 columns with missing value, we will skip them and use another columns for regressions. We’re going to use relatively high correlation columns, this is some of them:

               OverallQual  GrLivArea  GarageArea  SalePrice
OverallQual 1.000000 0.593007 0.562022 0.790982
GrLivArea 0.593007 1.000000 0.468997 0.708624
GarageArea 0.562022 0.468997 1.000000 0.623431
SalePrice 0.790982 0.708624 0.623431 1.000000

Scale the data so it will be converge fast using

Figure 9. Scaler equation

where µ is the means of the features, and σ is the standard deviation

And now we can fit our datasets

Conclusion

In this article, we have learned:

  1. Generalization of simple linear equation to multiple linear equation
  2. Derived the equations of Multiple Linear Regression, both on Algebra and Matrix notation.
  3. Implementing the Gradient Descent on Multiple Linear Regression

Please, share this post and give it a clap if you like it.

--

--