Multiple Linear Regression and Gradient Descent using Python
This post will explain the Linear Regression with multiple variables and its implementation in Python.
Before we dive deeper in multiple linear regression, take a detour on simple linear regression on this post.
We’ll use some equations and codes from that post.
Multiple Linear Regression
Recall the equation for simple linear regression
The predicted labels (y hat) is depend only on single feature x. What if we have a datasets with of multiple independent features? If that is the case, then we need the multiple linear regression. The equation is:
where:
Yi=the predicted label for the ith sample
Xij=the jth features for the ith-label
W0=the regression intercept or weight
Wj=the jth feature regression weight
Notice that when the labels y depends only on one variable x, the equation become simple linear equation y=w1x + w0.
We can write the multiple linear regression in matrix notation
Gradient Descent
The cost function to be minimized in multiple linear regression is the Mean Squared Error :
in matrix form, the partial derivate of the cost function can be written as
The updated weights on k+1 iteration become
where alpha is the learning rate.
Implementation
Recall the model we wrote for the Simple Linear Regression
We’re going to update the code, so it will also work for Multiple Linear Regression cases.
Update the constructor
In the constructor, we just added the initialize weights matrix as the model property.
Update the fit method by following these steps:
- modify the features matrix by adding 1 column with its value equal to 1 as the intercept (w0)
- Initiate the value of weights matrix to zero.
- For iteration 1 until n (the class iters property)
- Compute the prediction labels matrix using figure 3.
- Compute the error between the known labels matrix and the predicted labels on step 3.
- Compute the partial derivate of cost function using figure. 6
7. Update the weight matrix using the equation in figure 7.
Finally for the predict method:
Lets recap our updated model:
Before testing the model on multiple variable datasets, let’s test the model for the simple linear regression case, we will re-use the test score dataset from the previous post
The output of MSE is around 3.30, which is same as the previous model.
Lets test this model for multivariable regression. We’re going to use the housing price datasets from Kaggle. Download and save into datasets folder.
We’re going to predict SalePrice
, lets explore the datasets a little bit
df["SalePrice"].describe()count 1460.000000
mean 180921.195890
std 79442.502883
min 34900.000000
25% 129975.000000
50% 163000.000000
75% 214000.000000
max 755000.000000
Name: SalePrice, dtype: float64
Lets see the distribution graph of the SalePrice
Most of the prices lies between 100K-200K, but its appear to be a lot of outliers on the more expensive side.
As for now, we don’t want to get any missing value in our datasets. So, lets check out which columns are containing the missing value
PoolQC 1453
MiscFeature 1406
Alley 1369
Fence 1179
FireplaceQu 690
LotFrontage 259
GarageYrBlt 81
GarageType 81
GarageFinish 81
GarageQual 81
GarageCond 81
BsmtFinType2 38
BsmtExposure 38
BsmtFinType1 37
BsmtCond 37
BsmtQual 37
MasVnrArea 8
MasVnrType 8
Electrical 1Columns with missing values: 19
So there are 19 columns with missing value, we will skip them and use another columns for regressions. We’re going to use relatively high correlation columns, this is some of them:
OverallQual GrLivArea GarageArea SalePrice
OverallQual 1.000000 0.593007 0.562022 0.790982
GrLivArea 0.593007 1.000000 0.468997 0.708624
GarageArea 0.562022 0.468997 1.000000 0.623431
SalePrice 0.790982 0.708624 0.623431 1.000000
Scale the data so it will be converge fast using
where µ is the means of the features, and σ is the standard deviation
And now we can fit our datasets
Conclusion
In this article, we have learned:
- Generalization of simple linear equation to multiple linear equation
- Derived the equations of Multiple Linear Regression, both on Algebra and Matrix notation.
- Implementing the Gradient Descent on Multiple Linear Regression
Please, share this post and give it a clap if you like it.