Decoding:

Multi linear Regression

Calculations by hand

Nishigandha Sharma
Analytics Vidhya

--

Multiple linear regression (MLR), also known simply as multiple regression, is a statistical technique that uses several explanatory variables to predict the outcome of a response variable. These variables can be both categorical and numerical in nature.

Please note: The categorical value should be converted to ordinal scale or nominal assigning weights to each group of the category. The formula will consider the weights assigned to each category.

Multiple regression is an extension of linear regression that uses just one explanatory variable. The resultant is also a line equation however the variables contributing are now from many dimensions. Multiple linear regression is also a base model for polynomial models using degree 2, 3 or more.

If you want to understand the computation of linear regression. Check out the article here.

𝑦 = b₀ + b₁X₁ + ⋯ + bᵣXᵣ + 𝜀.

b0 — constant/ y-intercept

b1 , b2- coefficients for each variable

X1, X2 — predictors

𝜀 — Error rate — This is small negligible value also known as epsilon value. For this calculation, we will not consider the error rate

We take the below dummy data for calculation purposes:

Data under consideration

Here X1 & X2 are the X predictors and y is the dependent variable. From the above given formula of the multi linear line, we need to calculate b0, b1 and b2 . Lets look at the formula for b0 first.

b0 = ȳ — b1* x̄1 — b2* x̄2

As you can see to calculate b0, we need to first calculate b1 and b2. Lets look at the formulae:

b1 = (Σx2_sq) (Σx1 y) — (Σ x1 x2) (Σx2 y) / (Σx1_sq) (Σx2_sq) — (Σ x1 x2)**2

b2 = (Σx1_sq) (Σx2 y) — (Σ x1 x2) (Σx1 y) / (Σx1_sq) (Σx2_sq) — (Σ x1 x2)**2

Now this definitely looks like a terrifying formula, but if you look closely the denominator is the same for both b1 and b2 and the numerator is a cross product of the 2 variables x1 and x2 along with y.

What is noteworthy is that the values of x1 and x2 here are not the same as our predictor X1 and X2 it’s a computed value of the predictor. Before we find b1 and b2, we will compute the values for the following for both x1 and x2 so that we can compute b1 and b2 followed by b0:

· (Σxi_sq)

· (Σxi y)

· (Σ x1 x2)

Here ‘i’ stands for the value of x say variable 1 or variable 2 and N is the number of records which is 10 in this case. Now we can look at the formulae for each of the variables needed to compute the coefficients.

(Σxi_sq) = (ΣXi2) — (ΣXi)**2/ N

(Σxi y) = (ΣXi y) — ((ΣXi) (Σy) )/ N

(Σ x1 x2) = (Σ x1 x2) — ((ΣX1) (ΣX2) ) / N

Looks like again we have 3 petrifying formulae, but do not worry, let’s take 1 step at a time and compute the needed values in the table itself.

Original Data computed with the additional required columns

Great now we have all the required values, which when imputed in the above formulae will give the following results:

Computation required for coefficients

Now it’s time to compute b1, b2 and b0 :

Coefficients and the y-intercept calculation

We now have an equation of our multi-linear line:

y = b0 + b1*X1 +b2*X2

y = (- 0.72) + 0.02 (X1) + 0.38 (X2)

Now lets try and compute a new value and compare it using the Sklearn’s library as well:

Say X1 = 5 and X2 = 6:

Value of y when X1 = 5 and X2 = 6

Now comparing it with Sklearn’s Linear Regression.

Note: Sklearn has the same library which computed both Simple and multiple linear regression.

Linear Regression from Sklearn’s library

Yay!!! We have the exact same results with the inbuilt Linear Regression function too.

We can thus conclude that our calculations are correct and stand true.

Hope you all have more clarity on how a multi-linear regression model is computed in the back end.

Any feedback is most welcome. Give a clap if you learnt something new today !

--

--