Deriving Normal Equation for Multiple Linear Regression

4 min readOct 19, 2021

The mathematical formulation of Normal Equation

In this article, we will be seeing the steps to derive the normal equation for multiple linear regression.

What is Multiple Linear Regression?

Multiple Linear Regression is a supervised machine learning algorithm. This algorithm uses more than one independent variable to predict a target variable. The below table is an example of a multiple linear regression problem.

Table with 4 independent variables and 1 dependent variable

The above table has marks details and SAT score of 5 students. Each subject is a independent variable( as called predictors,explanatory variable or regressor (in regression problem)) and one dependent variable (also called label,target,response variable). Since we have more than one independent variable we have to find the coefficients for each independent variable. In my Simple Linear Regression article, we have discussed how to find the coefficients. Now we will extent simple linear to multiple linear and find the coefficients.

Mathematical Derivation

We know the equation of a line for simple linear regression is

Similarly, now for the multiple linear regression, the formula will be,

Equation of a plane with multiple independent variables

The above formula can be rewritten in matrix form as,

This can be further rewritten as,

Now the above matrix can be represented as (all capital letters represents Matix in further equations)

matrix representation

we know, the error is given by (Actual — Predicted), So the error can be represented in matrix form as below.

we know, for linear regression, for n observations, the error/cost function is given by,

Lets, derive an equation for the error function to represent in matrix form

Expanding the equation 1, we can rewrite it as,

The above equation can be rewritten in matrix form as,

Representing error function in matrix form

Hence, the Error/Cost function in matrix notation is given by,

Error Function in Matrix notation

Now the error equation becomes,

By Linear algebra,

Substituting the values in equation 2, equation 2 can be rewritten as,

Replacing, Yhat = Xβ in equation 3,

To simplify equation 4, we have to prove the following term as equal

solving equation 5,

after simplifying, equation 4 becomes,

Applying partial derivatives concerning β,

By applying matrix differentiation on equation 8,

simplifying the equation further,

After simplifying we have got a β coefficient matrix.

In the above equation X represents X_train, Y represents y_train from the dataset. Plugging those values into the above equations will give us the β coefficient matrix. This matrix will have (independent variables + 1) different values. Suppose if your data set has 5 independent variables, your coefficients will be 6.

Predictions can be made using X_Test and β coefficient. That will be the dot product of X_test and β coefficient.

Conclusion

This normal equation is used in Sci-kit learn’s LinearRegression() which works well for small datasets. LinearRegression()’s computational complexity is too high since it involves calculating matrix inverse operation and complexity is o(n³). Please refer complexity of matrix inverse operation. To overcome these complexity issues, SGDRegressor() method is used, which uses gradient descent to find the coefficients.

Please refer to the practical implementation of Normal Equation here — https://git.io/J6Wes

LinkedIn: Bhanumathi Ramesh

Deriving Normal Equation for Multiple Linear Regression

Written by Bhanumathi Ramesh