Ordinary Least Squares Regression

Arun Addagatla
Apr 12 · 7 min read

This article is a section of Linear Regression in a NutShell

Ordinary Least Squares regression (OLS) is more commonly named linear regression algorithm is a type of linear least-squares method for estimating the unknown parameters in a linear regression model.

In the case of a model with ’n’ explanatory variables, the OLS regression equation is given as:


  • y is the dependent variable
  • β₀ is the intercept of the model
  • xᵢ corresponds to the iᵗʰ explanatory variable of the model
  • ε is the random error with zero mean and variance σ²

In OLS the least-square stands for minimum square error or SSE (Sum of Squared Error).

Lower the error of the model better its explanatory power.

So this method aims to find the line which minimizes the sum of squared errors.

We can find many lines that fit the data but the OLS determines the one with the smallest error.

Graphically it is the one closest to all points simultaneously

Such a system usually has no exact solution, so the goal is instead to find the coefficients β which fit the equations “best”.

Simple linear regression

For the Simple linear regression model, the computation is simple. Consider the equation of simple linear regression :

Equation of simple linear regression

To calculate values of α and β, OLS minimizes error term using the equations :

Multiple Linear Regression

For multiple linear regression, the computation becomes a bit complex. Since in multiple regression there are more than two dimensions, we represent them using high-dimensional hyperplanes.

This is a minimization problem we will make use of calculus and linear algebra to determine the slope and intercept of the line.

The expression used to find the best fitting line is :


  • T denotes the matrix transpose
  • X indicates the values of all the independent variables associated with a particular value of the dependent variable and Xᵢ = xᵢᵗ
  • y denotes the dependent variable
  • The value of b which minimizes this sum of the square error is called the OLS estimator for β.

Suppose b is a “candidate” value for the β. The quantity (yᵢ − xᵢᵗb), called the residual for i ᵗʰ observation, measures the vertical distance between the data point (xᵢ, yᵢ) and the hyperplane y = xb, and thus assesses the degree of fit between the actual data and the model.

The residual of an observed value is the difference between the observed value and the estimated value of the quantity of interest.

We can try minimizing the squared sum of errors on paper but with a larger dataset, it is almost impossible.

Nowadays, regression analysis is performed through software and programming languages like SAS, Excel, Python, and R.

There are other methods for determining the regression line. They are usually preferred in different contexts.

Some of them are :

  • Generalized least squares
  • Maximum likelihood estimation
  • Bayesian regression
  • Kernel regression
  • Gaussian process regression

However, OLS is yet powerful enough for many if not for most linear problems.

There are five different assumptions of OLS to be considered before performing regression analysis.

  1. Linearity
  2. No endogeneity
  3. Normality and Homoscedasticity
  4. No Autocorrelation
  5. No Multicollinearity

The linear regression assumes linearity. Each independent variable is multiplied by a coefficient and summed up to predict the value. The linear regression is the simplest non-trivial relationship. It is called linear because the equation is linear.

Linearity means there must be a linear relationship between dependent and independent variables.

Check for Linearity

One way is to scatter plot the independent variable against the dependent variable. If the data points from a pattern that looks like a straight line then the linear regression model is suitable.

Fixes for linearity

  • Run a non-linear regression
  • Exponential transformation
  • Logarithmic transformation

It refers to the prohibition of a link between the independent variables and the errors.

Mathematically expressed as :

In this case, the summation of the error term with the difference between the observed values and the predicted values is correlated with independent variables. This problem is referred to as omitted variable bias.

The omitted variable bias is introduced when the relevant variable is not included in the analysis.

Basically, everything which is not explained by the model goes into the error.


  • The incorrect exclusion of a variable leads to biased and counterintuitive estimates that are toxic to regression analysis.
  • An incorrect inclusion of a variable leads to inefficient estimates which don’t bias the regression and one can drop these variables.

Fixes for Endogeneity

Omitted variables bias varies from problem to problem. It is always sneaky and to overcome it one must have experience and advance knowledge.

1. Normality - We assume that the error term is normally distributed.

What if the error term is not normally distributed?

The solution to the problem is the central limit theorem.

The central limit theorem states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement, then the distribution of the sample’s means will be approximately normally distributed.

This theorem makes the error term normal by default.

2. Homoscedasticity - Homoscedasticity means to have equal variance. The error term should have equal variance with each other.

Consider an example,

If a person is poor then he or she will spend a constant amount of money on food and other accommodates. But the wealthier an individual, the higher is the variability of his expenditure. Therefore heteroscedasticity exists.

Homoscedasticity refers to the circumstance in which the variability of a variable is unequal across the range of values. This is mainly due to the presence of outliers in the data.

An outlier in heteroscedasticity means that the observations that are either small or large with respect to the other observations are present in the sample.

Fixes for heteroscedasticity

  • Check for Omitted variable bias
  • Look for outliers and try to remove them
  • Perform log transformation on the explanatory variable

No autocorrelation is also known as no serial correlation. According to the assumptions Errors should not be uncorrelated.

Check for Autocorrelation

  • Plot all the residuals on the graph and check for patterns. If there are no patterns to be seen then there is no autocorrelation.
  • Durbin Watson test - Its value falls between 0 to 4. Value of 2 indicated no autocorrelation. Values below 1 and above 3 indicated the presence of autocorrelation

Fixes for Autocorrelation

The only solution for autocorrelation is to avoid using linear regression.

One of the examples of the Autocorrelation problem is the time series analysis.

Multicollinearity refers to a situation in which more than two explanatory variables in a multiple regression model are highly linearly related.

We observe multicollinearity when two or more variables have a high correlation.

Consider an example with a equation a = 2 + 5 * b.

This equation can be rearranged as b = (a - 2) / 5.


  • ‘a’ and ‘b’ are two variables with exact linear combinations
  • Because ‘b’ can be represented using ‘a’ and vice-versa.

A model containing ‘a’ and ‘b’ as explanatory variables would have perfect multicollinearity. This imposes a big problem on our regression model as the coefficients will be wrongly estimated.

The reasoning is that if ‘a’ can be represented with ‘b’ then there is no point using both we can just keep one of them.

Check for Multicollinearity

  • Multicollinearity is a big problem but is also the easiest to notice.
  • Before creating the regression find the correlation between each of the two pairs of independent variables.

Fixes for Multicollinearity

  • Drop one of the two features
  • Transform two features into a single feature

Thanks for reading this article! Leave a comment below if you have any questions. Be sure to follow @ArunAddagatla, to get notified regarding the latest articles on Data Science and Deep Learning.

You can connect with me on LinkedIn, Github, Kaggle, or by visiting Medium.com.

Geek Culture

Proud to geek out.

Sign up for Geek Culture Hits

By Geek Culture

Subscribe to receive top 10 most read stories of Geek Culture — delivered straight into your inbox, once a week. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Geek Culture

A new tech publication by Start it up (https://medium.com/swlh).

Arun Addagatla

Written by

I am a Third-year Computer Engineering undergraduate student with an interest in Data Science, Deep Learning, and Computer Networking.

Geek Culture

A new tech publication by Start it up (https://medium.com/swlh).

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store