Linear Regression: 7 Assumptions

Jonathan Bogerd
4 min readSep 18, 2022

--

Introduction

In this article we will go over the 7 assumptions of Ordinary Least Squares (OLS). In the previous article we derived the formulas for OLS. This article covers the 7 assumptions that are required for OLS. In Part 3 we will investigate Linear Regression with multiple input variables. This three part series will teach all the basics you need to know on Linear Regression.

Data generating process

In this article we will use the same data generating process as in the first article on Linear Regression. The code and means to plot the results can be found in the code box below. We estimate parameters a and b with the formulas we have derived in the previous article. If you want to see how we derived those, please check out my previous article. Note that this time, the calculations for the variances of a and b are also included. We will show the impact of the assumptions of Linear Regression by changing the code to violate one of the assumptions.

Assumptions

1. Random disturbance with mean zero

The first assumption we have for Linear Regression is that the random errors should have a zero mean. If this would not be the case, it is easy to see that the parameter estimate for a is incorrect and will include the mean of the error term. This can be seen in the following example in which the mean of the error is changed to be 3. The resulting estimated parameter value for a is about 3 of the correct value, making the parameter estimate for a biased.

2. Homoskedasticity

The next assumption, also on the error terms, is that the variance for all error terms is the same. If this is violated, the parameter estimates will still be unbiased, however, the calculation for the variance of a and b will be incorrect. In the code sample you can check that the parameters are still unbiased, however the variance has changed.

3. No correlation

The next assumption in OLS is that the error terms are not correlated. If this is the case, it is usually called serial correlation or autocorrelation. The problem with violating this assumption, is that it shows that some information is uncaptured in the model. You can partially forecast the next error by knowing the current error, which therefore should be part of your model. This can be tackled in mainly two ways: Use a time series model or add a independent variable that captures this information.

4. Jointly normally distributed error terms

This assumption is technically not required for OLS. Even if the error terms would not be normally distributed, our formulas are correct and the parameter estimates unbiased. However, this property has some advantages. Firstly, our parameter estimates have the lowest possible variance of any unbiased estimator. Secondly, because this assumptions ensures that the OLS estimate is also the Maximum Likelihood Estimator (MLE), it makes our parameter estimation the best possible estimator.

5. Constant parameters

Constant parameters in the data generating process (DGP) are required to get accurate estimates for both a and b. If the parameters would change during the DGP, there is no way to estimate them properly and the estimates would be biased. If you encounter a real world situation in which parameters might have changed during the collection of the data, it might be worthwhile to estimate a parameter for each window with constant parameters. Note that changing parameters is closely related to the concept of non-stationarity. An example of changing parameters can be found in the code sample below.

6. Fixed Regressors

Fixed regressors can be seen as a fixed set of numbers instead of variables themselves. We needed this in the derivation of the formulas for our parameters. If you missed this, please read my previous article on this topic. In practice however, it is more reasonable to think of our regressors X as outcomes of an underlying process. It can be shown that our formulas will still holdup in this case, provided that our regressors are independent of all error terms.

7. Linear Model

Last but not least, in order for Linear Regression to provide good results, the underlying model or data generating process should of course be linear as well. Although this is usually unknown, it is important to evaluate the performance of the linear regression model, for instance by comparing R2 or by visual inspection. In the graph below, you can see the performance of a linear regression model for a non linear data generating process. Clearly, linear regression is not the correct choice in this case.

Linear Model vs Quadratic DGP

Conclusion

In this article we discussed the 7 assumptions required for OLS and investigated the consequences of breaking these assumptions. In the next article, we will have a look into Linear Regression with multiple input variables. If you want to read the first article in this series, please find the link here.

If you want to read more articles on data science, machine learning and ai, be sure to follow me on Medium!

Sources

  1. Heij, C., de Boer, P., Franses, P. H., Kloek, T., & van Dijk, H. K. (2004). Econometric methods with applications in business and economics. Oxford University Press.
  2. Gujarati, D. N. (2021). Essentials of econometrics. SAGE Publications.

--

--

Jonathan Bogerd

Data Scientist. I write about Data Science, Machine Learning and anything related to AI.