Effect of Multi-collinearity on Linear Regression

Gaurav Sharma
Analytics Vidhya
Published in
9 min readSep 17, 2020

This story is divided into following experiments -

Experiment 1 — Effect of highly correlated attributes on their respective coefficients
Experiment 2 — Effect of no/less correlated attributes on their respective coefficients
Experiment 3 — Effect of data with both high/low correlated attributes on their respective coefficients
Experiment 4 — Effect of correlated attributes on model’s prediction

And the end we will study VIF.

Before starting we should clear some basics,

  • The regression coefficient value associated with an attribute represents the mean change of the dependent variable given a one-unit shift in an independent variable.
  • We cannot compare two regression coefficients because they can be on different scale, for e.g. if coefficient of x1 is 2 and coefficient of x2 is 4 then we cannot directly say that x2 is more important than x1, because they can be on different scales. Like x1 is distance in km and x2 is weight in grams.
  • But we can compare two standardized regression coefficients.

Throughout the notebook I will directly compare regression coefficient because they are on the same scale as I cooked the data artificially.

Note: Whenever I say estimate of x, I mean estimate of coefficient of x.

Firstly, import all necessary libraries.

Experiment 1

Effect of highly correlated attributes on their respective coefficients.

First we cook some artificial data,

png

Now, we apply linear regression to estimate the coefficients of each attribute.

Interpretation of result

These estimated coefficients means one unit change in x1 changes on average approx. 2 units of y in same direction and one unit change in x2 changes on average approx. 2 units of y in opposite direction. Which we know is true as we did these settings while cooking the data. So, when these 2 highly correlated features are feeded independently to the linear regression then the coefficient’s estimates are very close to true values.

The slight errors in the estimated parameters is due to the noise in data, if we have clean data with no noise then the estimated parameters converges to true parameters.

now, fit both x1 and x2

Interpretation of result

These estimated coefficients means one unit change in x1 changes approx. 1 units of y in same direction and one unit change in x2 changes approx. 1 units of y in opposite direction. Which we know is not true as we did different settings while cooking the data.

The reason for this is because the two features are highly correlated(perfect correlation in this case) their contribution is divided equally in changing the response. The coefficient of x1 no more interpreted as the change in y on 1 unit change in x1 but it’s now interpreted as change in y on 1 unit change in x1 given 1 unit change of x2, i.e., it is conditioned now. This is because if x1 & x2 would have 0 correlation between them then we can keep x2 constant and change x1 by one unit and detect the change in y but now because of such a high correlation b/w them we can’t keep x2 constant on changing x1, if we change x1 a bit x2 also changes depending on it’s correlation with x1.

So long story short, if now someone looks at these coefficients then they interpret the results as one unit change in x1 changes approx 1 units of y in same direction and one unit change in x2 changes approx 1 units of y in opposite direction. Such bad estimates are very dangerous because our interpretation of model results in critical decision making and if results are wrong then interpretation is wrong and hence decisions are wrong. This may have critical effects in medical field as these two features could be some components of drugs and we may be interested in their effect on some disease(i.e., y in this case).

So when 2 or more highly correlated features are feeded to the linear regression their estimates are very bad and can’t be trusted. Hence, multi-collinearity results in less reliable estimates and this reliability is inversely proportional with level multi-collinearity i.e., more multi-collinearity means less reliable coefficients.

Experiment 2

Effect of no/less correlated attributes on their respective coefficients.

First we cook some artificial data,

png

We can see from the plots that both x1 & x2 are positively correlated with y but has very less correlation with each other and this is also verified by below pearson’s correlation.

The low pearson’s correlation between x1 & x2 suggests that there is very less correlation between them, and can even be rejected under significance level of 5%.

So due to less multi-collinearity estimated coefficients are now more reliable.

Now, we apply linear regression to estimate the coefficients of each attribute.

Although the estimated parameters are quite reasonable but there are some errors in them and that’s not just due to the noise in data but also due to the fact that y depends on both x1 and x2 (as y=y1+y2)

now, fit both x1 and x2

We can see that due to very less multi-collinearity both the estimated coefficients are not affected a lot and even the estimates are improved after taking both x1 & x2 because y is y1+y2. Hence, depending on both x1 & x2, therefore using both the predictors which impact y we get better and more accurate estimates.

Experiment 3

Effect of data with both high/low correlated attributes on their respective coefficients.

First we cook some artificial data,

png

So basically x1 & x3 are perfectly correlated but the pairs (x1,x2) & (x2,x3) have very less correlation.

Now, we apply linear regression to estimate the coefficients of each attribute. We will first fit x1, x2 and x3 individually,

now, fit x in pairs

The coefficient estimates of x1, x2, x3 are pretty close to true parameters but a bit noisy and the reason is already explained above. Also due to high multi-collinearity estimates get highly affected when y is regressed on x1 & x3 combined. But the estimates are improved when y is regressed on both x2 & x3 due to the fact that y is composed of both (x1, x2) and x3 is nothing but -x2, hence regressor get everything that is needed to estimate true behaviour.

now, fit all x1, x2 and x3

So, if you followed the notebook then this result is not unexpected for you.

Because x1 & x3 are perfectly correlated hence the estimates of their coefficient are affected a lot and in this case get exactly half due to correlation of -1. But you may have noticed that coefficient of x2 is not affected at all and even get improved. This is due to the fact that x2 has very low correlation with both x1 & x3, hence multi-collinearity between x1 & x3 has no affect on it’s estimates. Also the estimates are improved due to the fact that y depends on both x1 & x2 and we provided both.

Ground Truth
— one unit change in x1 changes 1 unit of y in same direction
— one unit change in x2 changes 2 units of y in same direction
— one unit change in x3 changes 1 unit of y in opposite direction

Interpretation from results
— one unit change in x1 changes 1 unit of y in same direction (which is absolutely False)
— one unit change in x2 changes 2 units of y in same direction (which is absolutely True)
— one unit change in x3 changes 1 unit of y in opposite direction(which is absolutely False)

Although we see that multi-collinearity affects the estimated parameters but what about prediction? because in some rare cases we may not be that much interested in interpreting the results as compared to getting high prediction accuracy (though such cases are very rare, but the idea is we are happy if we get good overall predictions even though individual estimates are wrong).

Experiment 4

Effect of correlated attributes on model’s prediction.

We will use the same experiment_1 data and do a train-test split,

we first fit x1 only,

now, fit both x1 and x2

So multi-collinearity has no significant effect on the model’s performance and it’s kind of make sense because individual estimates are affecting each other only. You can think of this like if two correlated variables are just two entangled variables(where strength of entanglement is proportional to the strength of collinearity between them) such that if one changes other also changes therefore making the end result same.

Conclusions

From these experiments we conclude following results

  • If model has coupled/collinear features than their estimated coefficients are affected, hence can’t be trusted. Also the effect depends on degree/severity of collinearity, less collinearity have negligible effect and high collinearity have huge effect. (proved from exp1 & exp2)
  • In a model the estimated coefficients are affected only for those features which are involved in collinearity with other. (proved from exp3)
  • Multi-collinearity doesn’t effect the predictive power of the model. (proved from exp4)

Long Story Short

If model’s predictive performance is your only goal then no need to worry about multi-collinearity but if interpretation of fitted model is your primary goal or as important as performance then you should not rely your decisions on the bases of fitted model.

So, the next question that arise is do we need to find multi-collinearity manually and if so then what in case of large number of predictors. Isn’t there a tool for that? And the answer is we are not required to do it manually. Most of the tools uses Variance Inflation Factor(VIF) for finding the strength of multi-collinearity among predictors.

Variance Inflation Factor

VIF detects multi-collinearity in regression analysis for every predictor by taking that predictor, and regressing it against every other predictor in the model giving a R-squared value and then substituting it in the VIF formula.

where, R_{i}^{2} is the R-squared value of i-th predictor.

The value of VIF tells what percentage variance (i.e. the standard error squared) is inflated for each coefficient. For example, a VIF of 1.9 tells that the variance of a particular coefficient is 90% bigger than what you would expect if there was no multi-collinearity — if there was no correlation with other predictors.

Interpretation of VIF:
1 is not correlated.
1–5 is moderately correlated.
>5 is highly correlated.

let’s define a function for VIF,

we will use the same experiment_3 data,

let’s first fit the OLS on just x1 and x2 (they are not correlated),

So, all the estimated parameters are same as the true parameters. Also, VIF of both x1 & x2 is almost 1 indicating they are not correlated with each other hence every estimates and statistics calculated by OLS are correct and can be trusted.

now, fit the OLS on all x1, x2 and x3 (note, x1 and x3 are correlated),

So, all the estimated parameters are not same as the true parameters. x1 & x3 have wrong estimates due to perfect correlation but x2 has correct estimate due to no correlation. Also, VIF of both x1 & x3 is infinity indicating that they are perfectly correlated with each other hence, their estimates and statistics calculated by OLS are incorrect and can’t be trusted. But x2 has no impact and we can trust all it’s estimates and statistics.

Also note the warning [2], OLS automatically find such an high multi-collinearity in the design matrix.

Some techniques to deal with multi-collinearity

  • Step-wise removal of predictors having large VIF i.e., regress-remove-regress cycle until all VIF’s are in satisfying range.
  • Both ridge regression and lasso regression are addressed to deal with multi-collinearity or a mixture of these i.e., elastic-net regression.
  • Apply PCR(principal component regessor) but this will result in loss of model interpretation as principal components are hard to interpret and if that’s the case then you won’t apply it.

Lasso regression,

ridge regression,

elastic-net regression,

We see that this regularized regressors moves one correlated variable close to zero resulting in better estimate of the other variable. This is because in this case one of the x1 & x2 are redundant i.e., only one is needed hence it’s good to make one of them close to or equal to zero resulting in better estimates of the other.

You can get the entire documented jupyter notebook for this blog from here, you just need to fork it. Also if you like the notebook then up-vote, it motivates me for creating further quality content.

If you like this story then do clap and also share with others.

Also, have a read of my other stories which includes variety of topics including,

and many more.

References

Thank-you once again for reading my stories my friends :)

--

--