How Multicollinearity Is a Problem in Linear Regression.

Published in

The Startup

5 min readAug 13, 2020

Linear Regression is one of the simplest and most widely used algorithms for Supervised machine learning problems where the output is a numerical quantitative variable and the input is a bunch of independent variables or single variable.

The math behind it is easy to understand and that’s what makes Linear Regression one of my most favorite algorithms to work with. But this simplicity comes with a price.

When we decide to fit a Linear Regression model, we have to make sure that some conditions are satisfied or else our model will perform poorly or will give us incorrect interpretations. So what are some of these conditions that have to be met?

Linearity: X and the mean of Y have a Linear Relationship

2. Homoscedasticity: variance of the error terms is the same for all values of X.

3. No collinearity: independent variables are not highly correlated with each other

4.Normality: Y is normally distributed for any value of X.

If the above four conditions are satisfied, we can expect our Linear Regression model to perform well.

So how do we ensure the above conditions are met? Well, If I start going into the depth of all of the above conditions, it might result in a very long…

How Multicollinearity Is a Problem in Linear Regression.

Written by Moeedlodhi