Hand-Solving the math behind Simple Linear Regression(Ordinary Least Squares)

4 min readJan 29, 2023

Let’s say you want to implement a linear regression model on some data; you would probably use a programming language with pre-built libraries that hide all the math that goes into linear regression behind some easy-to-use syntax.

But if you are wondering what the math behind a linear regression is, this article is for you. In this article, I will solve the math behind a simple linear regression model and make it as digestible as possible.

To follow along with this article, you should be familiar with some concepts of Calculus, such as derivatives and the chain rule of differentiation.

Let’s start by making some assumptions first:

We have a population represented by the following function:

2. From the population, we can extract multiple samples, each having a slightly different set of data points. Let’s reporest the sample with the following line of best fit:

A number of data points with a line of best fit drawn through it

Notice that the equation representing the line drawn through sample data points have a hat on top of it. The “hat” represents estimated values, which means that we will be using this line to make predictions.

Residuals:

We know that the sample taken from the population is just a representation of the population, however, there are slight variations in the line drawn through the sample data points and population data points. In other words, the estimated model(the equation with the hats) is unlikely to fit the data perfectly so there will be residuals.

We represent the residuals by 𝑢^(pronounced as u hat). It is the difference between the true value and the estimated value as shown below:

An image showing the mathematical form of residuals

Our goal is to find beta0(hat) and beta1(hat) that will minimize 𝑢^(the residuals).

Finding beta0(hat) and beta1(hat):

There are a number of methods used to find the values of beta0(hat) and beta1(hat) that will minimize the residuals, however, the most common, and the most easy of them all is OLS(Ordinary Least Squares) method.

Through OLS, we can find the value of beta0(hat) and beta1(hat) that will minimize the residuals. OLS estimator minimises the sum of the squared residuals, therefore, the above equations for residuals can be rewritten as:

In the above, we have simple taken a square on both side of the 𝑢^. Then we substituted the value of y(hat) in 𝑢^. This gives us a new equation for 𝑢^, which we will use to find the values of beta0(hat) and beta1(hat).

Deriving beta0(hat):

Now to minimize residual, we will take derivative w.r.t beta0(hat).

Now, the values are in the form of summation, therefore, we will replace the summation signs with mean values.

The logic behind this is that whenever we are making prediction, we make it on the basis of average. For instance, for a certain value at x, we have the following y values:(6,5,6,7,8,6,7). The mean of these values is 6.7.

If someone asks me to predict his y value, I will tell him that his y value is 6.7. This is because whenever we make predictions, we make it on the mean of a range of values.

Using the above equation, we can find the value of beta0(hat)

Deriving beta1(hat):

Now that we have derived the value for beta0(hat), we can substitute it in 𝑢^ to find the value for beta1(hat).

We will use chain rule again to minimize the sum of residul squares.

The above will give us a value of beta1(hat).

Now that we have found both the value of beta0(hat) and beta1(hat), we can put it in the equation below:

With this, we have successfully found the values for beta0(hat) and beta1(hat) that minimizes the sum of residual squares.

The next step is to start making predictions. Simply put in the value of x, to get a predicted value of y.

I hope this article gives you an understanding of the math behind simple linear regression. If you liked this article and would like to see more of it, be sure to follow me up!

Hand-Solving the math behind Simple Linear Regression(Ordinary Least Squares)

Residuals:

Finding beta0(hat) and beta1(hat):

Deriving beta0(hat):

Deriving beta1(hat):

Written by Mazhar