Mathematics of Linear Regression

Published in

Mathematics and AI

4 min readMay 3, 2020

So till now we have learn basic algebra and optimisation technique. Now we can proceed with our first machine learning algorithm. So let's see how we can write our own linear regression code. This article is application of previous articles. If you have not read it please read it first for better understanding.

Gradient Descent Algorithm | Convergence, Performance, Behaviour

Lets move one step further and explore deep mathematics of gradient descent algorithm. This article is part 2 of my…

medium.com

Let’s take an example to understand that example better. Suppose you have a experience vs salary dataset. Now your task is to make a machine learning model which will predict salary for given experence. What will you do?

Let’s see how it’s look like on graph then we will decide what we have to do next.

Above graph gives clear hint that we have to find a equation of line for which distance of each dataset is as least as possible. Let me explain this line with an example.

From above figure you can see that we want to minimize d1, d2, d3, d4,d5, d6. How will you solve this problem now?

Before we start to solve this problem let’s first write an equation for that line. And we know that equation of line is y = mx+c where m is slope of line and c in intercept. So our task is to find best value of m and c for which distance of line from each dataset is minimum. Here is your dataset with 10 examples:

We will use Mean Square Error method here because it is differentiable and it gives us absolute value. And our task will be only to minimise numerical value of error.

Y(i) is actual value and and Y_hat(i) is predicted value or calculated value. m is total number of examples given.

lets suppose our values of m is 1 and c is -2 . (m = 1 , c = -2)

With this equation our new line equation will be Y = X-2

lets calculate Y_hat for first example. Y_hat(0) = 0–2

Y_hat(0) = -2 but actual value is 30000. so error is also huge for this which is 29998². This error is not acceptable . Now lets guess another value of m and c.

let’s say m = 1 , c = 3000 new equation of line is Y = X— 3000 .

Lets calculate new value for this equation Y_hat(0) = 0–3000 which is -3000. Our new error will be 27000² this value is also not acceptable. And you can keep on guessing values of m and c and find error values. This process can be very time consuming. Let's use our previous gradient descent technique to find minimum error.

Our task is to find value of m and c for which error is minimum. For this we have to differentiate our error with respect to m and c.

We can write Y(i) = m*X(i) + c .

∂m = ∂error/∂m = 2(mX(i)+c−y_hat(i))X(i)/m

∂c = ∂error/∂c=2(mX(i)+c−y_hat(i))/m

Now lets keep on updating our m value till we do not get minimum error value. for this we have to use again gradient descent algorithm

m_new = m — alpha * ∂m

c_new = c — alpha * ∂c

alpha is step size or learning rate.

Here is python code for this algorithms which will help you to find best value of m and c.

Cost Function

you can initialise w and b with any value.

Here is the python notebook with more explanation. You can check it for complete implementation.

Google Colaboratory

Edit description

colab.research.google.com

Feel free to ask your doubts. If you have any suggestions please let me know. Thanks for reading this articles. Hope this articles helps you to understand linear regression better.