Linear Regression from scratch and basic intuition!

Venkatesha Prasad S
Analytics Vidhya
Published in
4 min readSep 4, 2020

Linear Regression is basically the curtain raiser of the Machine Learning field. Understanding Linear Regression is not very difficult but it is a very important step. Linear regression is one of the easiest machine learning algorithm to implement.

maybe gradient is just an illusion !:ROFL:!

Linear Regression is the process of finding the relationship between two variables with the help of straight line. Our aim is to find the line which fits those points with minimal loss. The line is called as Regression line. The process of creating this Linear Regression line would be a very simple process. If you can’t understand anything until now, don’t worry. We will go through them step by step.

Linear Regression Definition :

As we saw earlier, Linear Regression is a method of finding relationship between two variables(X and y) in which X is independent variable and y is the dependent variable.

y = w1*X + w0

where,

y— Dependent variable

X — Independent variable

w0— Bias

w1— scale factor or coefficient

The Bias factor (w0) gives degree of freedom for the model.

Our job is to find the values m and b such that the loss is minimal. The two ways which are used for finding the values of m and b are Ordinary Least Square Method and the Gradient Descent.

We will use Gradient descent method to implement Gradient Descent.

J is the cost function. It finds the difference between the predicted value and actual value.

Our aim is to find a line which has the minimum cost. (i.e) to minimize the cost.

LETS GET INTO THE CODING PART !!!

Let us create a toy dataset with the help of sklearn library. This is the only part where we will use sklearn. The whole of Linear Regression implementation is done only with the help of numpy.

#LINEAR REGRESSION FROM SCRATCHfrom sklearn.datasets.samples_generator import make_regressionX, y = make_regression(n_samples=200, n_features=1, n_informative=1, noise=6, bias=30, random_state=200)
m = 200
from matplotlib import pyplot as plt
plt.scatter(X,y, c = "red",alpha=.5, marker = 'o')
plt.xlabel("X")
plt.ylabel("Y")
plt.show()

h( ) — h function is the hypothesis function. It return w1*x1 + w0.

cost( ) — cost function calculates MSE (mean squared error) between predicted value and actual value

grad( ) — grad function calculates the first order derivatives of the cost function w.r.t to w0 and w1.

descent( ) — descent function takes care of the gradient descent operation.(i.e.) It takes care of weight update process and tries the find a point where a loss minimum for each point.

lr — learning rate

graph( ) & formula( ) — it is used for plotting purposes.

import numpy as npdef h(X,w):
return (w[1]*np.array(X[:,0])+w[0])
def cost(w,X,y):
return (.5/m) * np.sum(np.square(h(X,w)-np.array(y)))
def grad(w,X,y):
g = [0]*2
g[0] = (1/m) * np.sum(h(X,w)-np.array(y))
g[1] = (1/m) * np.sum((h(X,w)-np.array(y))*np.array(X[:,0]))
return g
def descent(w_new, w_prev, lr):
print(w_prev)
print(cost(w_prev,X,y))
j=0
while True:
w_prev = w_new
w0 = w_prev[0] - lr*grad(w_prev,X,y)[0]
w1 = w_prev[1] - lr*grad(w_prev,X,y)[1]
w_new = [w0, w1]
print(w_new)
print(cost(w_new,X,y))
if (w_new[0]-w_prev[0])**2 +
(w_new[1]-w_prev[1])**2 <= pow(10,-6):
return w_new
if j>500:
return w_new
j+=1
w = [0,-1]
w = descent(w, w , 0.01)
def graph(formula, x_range):
x = np.array(x_range)
y = formula(x)
plt.plot(x, y, color="blue")

def my_formula(x):
return w[0]+w[1]*x
plt.scatter(X,y, c = "red",alpha=.5, marker = 'o')
graph(my_formula, range(-2,3))
plt.xlabel('X')
plt.ylabel('Y')
plt.show()

That’s it. Simple Linear Regression is done!

Although the accuracy won’t be great, it is a great model to start with.

This is it for this blog. I hope you guys learnt something useful. Please follow my account. Feel free to ask questions about the blog in the comments and show appreciation through claps. Also, connect with me through my LinkedIn account. Thanks for reading.

--

--