An introduction to simple linear regression

Published in

IoT Lab KIIT

5 min readJun 26, 2021

Linear regression is one of the most popular and best understood algorithms in the machine learning landscape.The algorithm assumes that the relation between the dependent variable(Y) and independent variables(X), is linear and is represented by a line of best fit. The case when we have only one independent variable then it is called as simple linear regression. If we have more than one independent variable, then it is called as multivariate regression.Since regression tasks belong to the most common machine learning problems in supervised learning, every Machine Learning Engineer should have a thorough understanding of how it works.In this blog,for better understanding it we will learn how to execute simple linear regression and will work on “Hard Work Pays Off”dataset.

How to perform a simple linear regression

Firstly we feed the training set into our learning algorithm, which then outputs a function (h), based on what it has learned from the training set. This function is called the “hypothesis“.

The formula for hypothesis we will use is:

hθ (x)=Ŷ=Theta[0]+Theta[1]x

Lets understand Cost Function

After we’ve trained our learning algorithm and got an hypothesis, we need to examine how good our results are. This is done by cost function.

While dealing with Linear Regression we can have multiple lines for different values of slopes and intercepts. But the main question that arises is which of those lines actually represents the right relationship between the X and Y and in order to find that we can use the Mean Squared Error or MSE as the parameter. For linear regression, this MSE is nothing but the Cost Function.

Mean Squared Error is the sum of the squared differences between the prediction and true value. And the output is a single number representing the cost. So the line with the minimum cost function or MSE represents the relationship between X and Y in the best possible manner. And once we have the slope and intercept of the line which gives the least error, we can use that line to predict Y.

Now lets talk about Gradient Descent

When there are one or more inputs you can use a process of optimizing the values of the coefficients by iteratively minimizing the error of the model on your training data.

This operation is called Gradient Descent and works by starting with random values for each coefficient. The sum of the squared errors are calculated for each pair of input and output values. A learning rate is used as a scale factor and the coefficients are updated in the direction towards minimizing the error. The process is repeated until a minimum sum squared error is achieved or no further improvement is possible.

When using this method, you must select a learning rate (alpha) parameter that determines the size of the improvement step to take on each iteration of the procedure.

Now we are going to dive a little deeper into a simple linear regression problem. Look at the data samples or also termed as training examples given in the figure below.

For this dataset lets assume an institute name ABC provides you a data of his past students and how they performed in the evaluation exam.We want to predict the score that we will get given the amount of time we spend on coding daily.Lets get started.

Step#1 Importing the required libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from mpl_toolkits.mplot3d import Axes3D

Step#2 In this step we will Load and Visualise the Data¶

· Download

· Load

· Visualise

· Normalisation

# Load
x = pd.read_csv('./Linear_X_Train.csv')
y = pd.read_csv('./Linear_Y_Train.csv')
# Convert X,Y to Numpy arrays
x = x.values
y = y.values
# Normalisation
u=x.mean()
std = x.std()
x = (x-u)/std
#Visualise
plt.style.use('seaborn')
plt.scatter(x,y,color='orange')
plt.title("Hardwork vs Performance Graph")
plt.xlabel("Hardwork")
plt.ylabel("Performance")
plt.show()

Step#3-Linear Regression

def hypothesis(x,theta):
    y_ = theta[0] + theta[1]*x
    return y_def gradient(X,Y,theta):
     m = X.shape[0]
     grad = np.zeros((2,))
     for i in range(m):
         x = X[i]
         y_ = hypothesis(x,theta)
         y = Y[i]
         grad[0] += (y_ - y)
         grad[1] += (y_ - y)*x
     return grad/mdef error(X,Y,theta):
      m = X.shape[0]
      total_error = 0.0
      for i in range(m):
          y_ = hypothesis(X[i],theta)
          total_error += (y_ - Y[i])**2
      return (total_error/m)def gradientDescent(X,Y,max_steps=100,learning_rate =0.1):
      theta = np.zeros((2,))
      error_list = []
      theta_list = []
      for i in range(max_steps):           # Compute grad
           grad = gradient(X,Y,theta)
           e = error(X,Y,theta)[0]           #Update theta
           theta[0] = theta[0] - learning_rate*grad[0]
           theta[1] = theta[1] - learning_rate*grad[1]           # Storing the theta values during updates  
           theta_list.append((theta[0],theta[1]))                                          error_list.append(e)      return theta,error_list,theta_list
#Getting theta values,error list showing change in error over time
theta,error_list,theta_list = gradientDescent(x,y)
#lets see Reduction error over timeplt.plot(error_list)
plt.show()

Step#4 calculate predictions for training data and plotting the Best Line

y_ = hypothesis(x,theta)
# Training + Predictions
plt.scatter(x,y)
plt.plot(x,y_,color='orange',label="Prediction")
plt.legend()
plt.show()

Step#5 Let’s now load testing data and apply this to see what accuracy we get.

# Load the test datax_test = pd.read_csv('./Linear_X_Test.csv').values
y_test = hypothesis(x_test,theta)
df = pd.DataFrame(data=y_test,columns=["y"])
df.to_csv('y_prediction.csv',index=False)#R2 (R-Squared) or Coefficient of Determination
def r2_score(Y,Y_):
    num = np.sum((Y-Y_)**2)
    denom = np.sum((Y- Y.mean())**2)
    score = (1- num/denom)
    return score*100r2_score(y,y_)

Output

97.09612226971643

An accuracy score of 97% with this is actually pretty good.

For better understanding lets visualise Loss Function,Theta Updates

# how Loss Actually looks like 
T0 = np.arange(-40,40,1)
T1 = np.arange(40,120,1)T0,T1 = np.meshgrid(T0,T1)
J = np.zeros(T0.shape)
for i in range(J.shape[0]):
    for j in range(J.shape[1]):
        y_ = T1[i,j]*x + T0[i,j]
        J[i,j] = np.sum((y-y_)**2)/y.shape[0]
# Visualise the J (Loss)
fig = plt.figure()
axes = fig.gca(projection='3d')
axes.plot_surface(T0,T1,J,cmap='rainbow')
plt.show()

#visualise Theta Updates
theta_list = np.array(theta_list)
plt.plot(theta_list[:,0],label="Theta0")
plt.plot(theta_list[:,1],label="Theta1")
plt.legend()
plt.show()

With that, we have reached the end of this blog. I hope this blog would have helped you get a feel about the idea behind linear regression algorithms.Also leave a comment and tell me how you found this blog.

An introduction to simple linear regression

Now we are going to dive a little deeper into a simple linear regression problem. Look at the data samples or also termed as training examples given in the figure below.

Written by Spriha Priyadarshi