Deriving Equation for Simple Linear Regression - OLS Method

Bhanumathi Ramesh
5 min readOct 17, 2021

--

Deriving Equations was not this simple

In this article, we will be deriving the equations for coefficients of simple linear regression by using the error terms and error function.

What is Regression?

Regression, in simple words, is a method of estimating a value based on another value.
For example, estimating height based on age.
- This method is used for,
1. forecasting
2. Finding out cause and effect relationship between values

What is Linear Regression?
Linear Regression is a supervised Machine learning algorithm. Linear regression is a type of regression analysis where there is a linear relationship between the independent(x) and dependent(y) variables. There can be positive Linear Relationship or Negative Linear Relationship

The objective of Linear Regression :
To fit a best-fit line in such a manner that the differences between the distance of the actual data points from the plotted curve/line are minimum

Types of Linear Regression
1 Simple Linear Regression
2 Multiple Linear Regression
3 Polynomial Linear Regression

Simple Linear Regression

It is an approach for predicting Y (dependent variable) on the basis of the single independent variable
It assumes that there is approximately a linear relationship between X and Y

this is given by Y ≈ β1X + β0
where,
X is the independent variable
Y is the dependent variable
β1 is the slop (or coefficient or weight)
β0 is y-intercept when x = 0, (or offset)

β0 and β1 are two unknown constants that represent intercept and slop terms in the linear model.
This β0 and β1 are known as model coefficients or model parameters

Once we train the model with our training data to produce estimates β0 and β1 we can predict the future values on the basis of particular x value by computing yhat = β0hat + β1hat * x

Mathematical Intuition
How to find the values of Coefficients (β0 and β1)

The coefficients can be found in two different ways
Closed-form
1. Uses direct formula — Wikipedia
2. also called as Ordinary least square ( LinearRegression() in sci-kit learn uses OLS method by default)
Non-Closed-form — uses differentiation
— Solved by Gradient decent (SGDRegressor() in sci-kit learn uses gradient descent)

Why are we using Gradient Decent instead of the direct formula from OLS?
— When the dimensionality increases the complexity of calculation increases with the OLS formula. OLS is considered when the dataset is very small.

Ordinary Least Square Method (Closed Form)
OLS method tries to find the β0 and β1 which minimizes the sum of squared errors

Closed-form formulas from OLs method

Let's build this equation from scratch

Assuming that independent and dependent variables are linear in nature, we try to fit “best line of fit” that tries to pass through all the data points as close as possible to reduce the error (also called residuals)

Residuals

Here d1,d2,d3,d4….dn represents errors(residuals)
then, Error can be written as

𝑑1+𝑑2+𝑑3+𝑑4+….+𝑑𝑛

Errors can be negative or positive values, this makes errors cancel out at some point. To overcome this issue and find the total errors, we square them

Why modulus is not used to convert the errors to positive values?

There are two reasons,
1. we want to penalize outliers
2. Mod can not be differentiable at the origin

now the Error term can be written as,

Error Function

This E is now referred to as Error Function or Cost Function(You will see Error term is also represented by J in some books)

Decomposing the error term furthermore,

Error function of β0 and β1

Now we have the Error function decomposed, we have to now find the values of β0 and β1 which minimizes the error.

From theory we know, y = f(x), meaning, y is a function of x, that is, a change in x will change the values of y.

Similarly, a change in β0 and β1 will change the values of error and finds the minimum errors.

From maths, we know,

Derivatives and chain rule

To find the minimum value of the error function, we have to carry out two steps,
1. Find the partial derivatives of errors with respect to β0 and β1
2. set up the derivatives to 0 and find the critical points, where the slop of the tangent line becomes 0

Finding β0

Finding β1

The implementation and comparison of the derived formula and LinearRegression algorithm are presented in the below link.

Please find the example here — Github

Conclusion

Deriving equations for finding coefficients of simple linear regression are presented in this article. Multiple Linear Regression and Polynomial Regression are out of the scope of this article and will be articulated in the coming articles.

LinkedIn Profile Bhanumathi Ramesh

--

--