Dealing with Multicollinearity in Regression

Mackenzie Mitchell
3 min readMar 7, 2020

Multicollinearity is a measure of the relation between so-called independent variables within a regression. This phenomenon occurs when two or more predictor variables in a regression analysis are strongly associated or correlated with one another. When this occurs, one predictor variable can be used to predict the variable that it experiences multicollinearity with.

Why Do We Care?

The occurrence of multicollinearity is problematic as this implies that the variables are not independent of one another. With multicollinear variables, when one variable undergoes a change, the other one will see some change as well. It is important to check for predictor variables that contain an association to one another because the individual coefficients (beta’s) and t-tests can become unreliable due to multicollinearity. If the correlation between two independent variables is 1, then we have perfect positive multicollinearity. Likewise, if the correlation between two independent variables is -1, then we have perfect negative multicollinearity.

Causes

There are two types of multicollinearity: data-based and structural. Data-based multicollinearity is caused by experiments that have a poor experimental design. Structural multicollinearity refers to multicollinearity caused by the researcher, or…

--

--