Build Your Own Linear Regression Model From Scratch

Ajay Bile
Analytics Vidhya
Published in
6 min readApr 27, 2020
Doraemon

Let’s start from some real word examples, so consider one month born baby, if a snake came in front of that baby, that baby will not get scared, in fact he or she may try to touch that snake. But lets supposed now that baby has grown up and he is of 20 years old, now assume if a snake come in front of that boy , then definitely that boy will run away from that snake. So in this time period of 20 years this boy has learn that snake is dangerous may be from his parents, may be from TV channel, may be from his friends, may be snake might have bite him in past, but somehow he learn the relationship that snake is equal to DANGER.

I will give you one more practical example, so lets consider the below table.

Dateset

So here if a person height is 5.1 then his weight is 54 , if height is 6.2 then weight 75 so like that we have define one table.
so if i asked you what is the weight of a person if his height is 6.0 ?

Your guess would be in between 70–72, right ?

So now lets analyze how you able to guess the weight. first by looking at data we can easily figure out the relationship between height and weight, that is if height is increasing then weight is also increasing.

So here your brain is so powerful that easily by looking at this data he is able LEARN the relationship between height and weight.

So this is how we as a human try to LEARN the pattern.

But now the question is HOW EXACTLY MACHINE LEARNING WORKS ? or HOW MACHINE TRY TO LEARN THE RELATIONSHIP ?

So for height equals to 6.0, you have your prediction right ? so now it’s time for machine to predict the weight? Lets find out how machine can predict the weight.
so here our aim is to predict the weight if height is 6.0

So now here i have graphically represent the height and weight.

Height Vs Weight

So in above graph you can clearly see that Yes, height and weight are directly proportional to each other.

Now you might be wondering that what is that line in the graph ???

So we all know the basic line equation, i.e. Y = m*X + c

Here,
m = slope of a line
c = intercept i.e. value of Y when X = 0

I will rewrite this line equation again which will define our problem statement.

weight = m * Height + c

So now this indicates that, if i know the value of m, value of Height and value of c then i can easily find out the weight.

Lets try this out, lets say m = 10, c = 1, and Height = 5.5 so in that case my weight would be 56 i.e. (weight = 10 * 5.5 + 1).
But if u go and see the table above then you will find out that when height is equals to 5.5 that time my weight is 65.

So here my predicted weight is 56 and actual weight is 65, so i got ERROR of 9 units.

ERROR = Predicted value - Actual value

n = number of examples.

for understanding purpose i have shown you error value for only one example, but cost function is nothing but summation of mean squared error of all examples.

Cost Function

So now the question arises how we can minimize this Cost Function? OR how we can find the best suitable line equation which would give me minimum cost function ? OR how i can reduced the Mean Squared Error value ? right … ?

If you observed carefully then u will come to know that, i randomly selected the value of m and c in my above line equation, i.e weight = m*Height + c, But if instead of m=10 and c=1, if i substitute some other values in m and c, then there may be a chance that value of cost function would be minimum.

So process of finding better values of m and c which gives us minimum value of Cost function is called as Gradient Descent Algorithm.

Hence, Gradient descent is a method of updating value of m and c to reduce the cost function(mean squared error). The idea is that we start with some values for m and c and then we change these values iteratively to reduce the error value or cost function. Gradient descent helps us on how to change the values.

Convex vs Non-convex function

So we have to keep on changing the values of m and c in a such a way that we can reach the minima of this convex function.

So Every time rate at which we change the value of m and c is called as Learning Rate (alpha). So our learning rate should be in a such a way that we should not miss the minima of convex function.

Gradient Descent (a0 = c and a1 = m in our case)

That’s it… lets try to perform above steps in programmatic way and see whats the best suitable values of m and c we can get.

import pandas as pd# Initialization of m and c
m = 0
c = 0

X = [5.1, 6.2, 5.8, 5.5, 5.0, 5.3, 6.0]
Y = [54, 75, 67, 65, 54, 59, 69]

data = pd.DataFrame(list(zip(X, Y)), columns =['X', 'Y'])

print(data.head())

X = data.iloc[:, 0]
Y = data.iloc[:, 1]
# The learning Rate
L = 0.0001
# The number of iterations to perform gradient descent
epochs = 1000
# Number of elements in X
n = float(len(X))

# Performing Gradient Descent
for i in range(epochs):
Y_pred = m * X + c # The current predicted value of Y
D_m = (-2 / n) * sum(X * (Y - Y_pred)) # Derivative wrt m
D_c = (-2 / n) * sum(Y - Y_pred) # Derivative wrt c
m = m - L * D_m # Update m
c = c - L * D_c # Update c

print(m, c)
# m = 11.05 , c = 1.94

def predict(x):
return m * x + c
predict(6.0)# 68.26

So we can see that using our own linear regression algorithm we are able to predict the weight.

Thanks for reading…

The first principle is that you must not fool yourself and you are the easiest person to fool. — Richard Feynman.

--

--

Ajay Bile
Analytics Vidhya

I learn everyday 😎 Python-SQL-Big Data-Data Engineering-Process Automation