A Deep Dive into Linear Regression Models in Machine Learning

Hasitha Gallella
10 min readDec 19, 2023

--

A step-by-step explanation of constructing a linear regression model from scratch with sample codes. Rather than relying on external libraries like Sklearn, we’ll dive into the mathematical principles that form the backbone of linear regression in this article.

Machine learning opens up a world of possibilities for predicting outcomes based on data, and one powerful technique within it is supervised learning. In the area of Supervised Machine Learning, linear regression stands as a foundational pillar, offering valuable insights for other Regression and Classification models. While it’s convenient to implement linear regression models with established libraries like Sklearn, understanding the underlying mathematics can deepen your understanding of the algorithm and empower you to customize it to unique challenges.

Without knowing anything about Regression simply you can create a linear Regression model as below by the sklearn library.

# Without knowing anything about Regression simply you can create a linear Regression model like below by the sklearn library.
from sklearn.linear_model import LinearRegression
import numpy as np

# Here you can have a linear regression model from a single-line
linear_model = LinearRegression()

X_train = np.array([1.0,2.0,3.0,4.0]) #Data set features
y_train = np.array([121,237,401,445]) #Data set target values

linear_model.fit(X_train.reshape(-1, 1), y_train) # X must be a 2-D Matrix

w, b = linear_model.coef_ , linear_model.intercept_
print(f"View Parameters: w = {w}, b = {b:0.2f}")
print(f"Manual prediction for X=2.5 : f_wb = wx+b = {w[0]*2.5 + b}")
X_test = np.array([[2.5]])
print(f"Model prediction for X=2.5 : {linear_model.predict(X_test)[0]:0.2f}")
Output:

In this article, we’ll discuss constructing a linear regression model from scratch. Rather than relying on external libraries as above, we’ll go into the mathematical principles that form the backbone of linear regression. Whether you’re a beginner seeking a foundational understanding or an experienced practitioner aiming to sharpen your skills, this will equip you with the knowledge you need to understand the basics of Machine Learning.

So, Let’s Start!!!

Content:

  1. What is Supervised Machine Learning?
  2. Numerical analysis for Linear Regression.
  3. Implementing a linear regression model with Python3.
  4. Gradient descent for convergence.

1. What is Supervised Machine Learning?

Supervised learning vs Unsupervised learning

Supervised learning and Unsupervised learning are the 2 fundamental types of Machine learning. Supervised machine learning, which constitutes approximately 99 percent of the economic value generated by machine learning today, is the driving force behind most of the applications we use in our daily lives.

The basic idea is simple but incredibly powerful, we provide the model with a set of labeled examples, and it learns to map the input data to the corresponding output. It’s like having a teacher guide the algorithm through a lesson, pointing out the right answers. From email spam filters to self-driving cars, supervised learning enables machines to make predictions, generalize knowledge, and perform complex tasks based on training data.

Supervised Machine learning based on Regression and classification. Where Regression models predict numerical or continuous values, while classification models predict categorical discrete values. The most fundamental types of the Regression algorithms are linear Regression and polynomial Regression. Here we go with Linear Regression.

2. Numerical Analysis for Linear Regression

If you studied Numerical Analysis for mathematics here’s a quick review about it. Because in Machine learning also we are going to implement something similar to what you learned under Numerical Analysis.

Here is a quick example of how to find the best line P1(x) = a1*x +a0 for a given x|y data by using numerical methods, we can find a1, a0 coefficients as below;

Calculating a0 and a1 parameters with Numerical Methods

Here is a quick example of how to find the best Polynomial P2(x) =a2*x**2 + a1*x +a0 for a given x|y data by using numerical methods, we can find a1, a0 coefficients as below;

Calculating a0, a1and a2 parameters with Numerical Methods

Now, let’s try to implement an algorithm, which we can run on a computer to find the best line instead of these mathematical calculations.

We are using the method of

3. Implementing a Linear Regression Model with Python3

Now we’ll walk through the implementation of a linear regression model step by step. Don’t worry if you’re not a coding wizard; we’ll break it down into small steps. By the end, you’ll have a working model ready to make predictions.

First, we have to define a class to implement the model, It’s like we are defining our own data type called objects with its methods and attributes.

Attributes in an object are like variables holding information about its properties, representing details or facts. Meanwhile, methods are functions tied to the object, specifically crafted to execute actions or operations that involve the object’s attributes.

Example for defining a class for a new data type of Car Objects

In this example, we define a Car class with attributes (color, brand, is_engine_started) and methods (start_engine, honk_horn). The __init__ method initializes the object with specified color and brand attributes. The start_engine method simulates starting the car's engine, and the honk_horn method simulates honking the car's horn.

The instance my_car is created, and we access its attributes and invoke its methods, showcasing how attributes store information about the car, while methods perform actions related to the car's attributes.

Now let’s create an object class for our model. Before that, we have to learn about how to work with Vectors and Matrices with NumPy;

# NumPy routines for allocate memory and fill by user specified values
a = np.array([[5, 4, 3]]); print(f" a shape = {a.shape}, np.array: \n a = \n{a}")
a = np.array([[5], # One can also
[4], # separate values
[3]]); #into separate rows
print(f" a shape = {a.shape}, np.array: \n a = \n{a}")
Output 1 :
#vector indexing operations on matrices
c = np.arange(6)
print(f"c.shape: {c.shape}, \n c = \n{c}")

#reshape is a convenient way to create matrices
# reshape(2, 3) means 2 rows and 3 columns
b = np.arange(6).reshape(2,3)
print(f"c.shape: {b.shape}, \n c = \n{b}")
# reshape(-1, 2) means 2 columns and as many rows as needed
a = np.arange(6).reshape(-1, 2)
print(f"a.shape: {a.shape}, \n a = \n{a}")

#access an element
print(f"\na[2,0].shape: {a[2, 0].shape}, a[2,0] = {a[2, 0]}, type(a[2,0]) = {type(a[2, 0])} Accessing an element returns a scalar")

#access a row
print(f"a[2].shape: {a[2].shape}, a[2] = {a[2]}, type(a[2]) = {type(a[2])}")

#access an area
print(f"a[1:2,0:1].shape: {a[1:2,0:1].shape}, a[1:2,0:1] = {a[1:2,0:1]}, type(a[1:2,0:1]) = {type(a[1:2,0:1])}")
print(f"a[1:3,:].shape: {a[1:3,:].shape}, \n a[1:3,:] = \n{a[1:3,:]}, type(a[1:3,:]) = {type(a[1:3,:])}")
Output 2:

This NumPy arrays knowledge is extremely important to work with Data sets especially our X features and Y targets of our data set are stored in the form of NumPy arrays.

Now let’s Define a class to implement a Linear Regression model as below and then we’ll walk through the implementation step-by-step;

Now let’s go through the intuition behind each step;

Class Definition:

In python, you can define a class with class key word.

class LinearRegression():

Constructor Method:

 def __init__(self, learning_rate, no_of_itr):
self.learning_rate = learning_rate
self.no_of_itr = no_of_itr
  • This is the constructor method that initializes the linear regression model with the specified learning rate and the number of iterations for gradient descent.
  • learning_rate: Determines the step size at each iteration of the gradient descent.
  • no_of_itr: Specifies how many iterations the gradient descent will perform.

fit Method:

 def fit(self, X, Y):
self.X = X
self.Y = Y
self.m, self.n = X.shape
self.w = np.zeros((self.n, 1))
self.b = 0
for _ in range(self.no_of_itr):
self.gradient_descent()
  • The fit method is used to train the linear regression model with the given training data (X) and target values (Y).
  • self.w and self.b are weights and bias initialized as zeros, representing the parameters of the linear regression model.
  • It iteratively performs gradient descent (self.gradient_descent()) for the specified number of iterations.

Gradient Descent and Cost Function in Machine Learning:

Cost function and Model output

Gradient Descent is a fundamental optimization algorithm employed in machine learning to fine-tune models and minimize the error between predicted and actual outcomes. At its core, the process resembles finding the steepest downhill path on a mountain — adjusting model parameters iteratively to reach the optimal values that minimize the difference between predicted and actual results. The algorithm calculates the gradient of the cost function, a measure of the model’s performance, with respect to each parameter. The cost function quantifies the error between predicted and actual values, providing a metric for the model’s accuracy. By adjusting parameters in the opposite direction of the gradient, the model converges towards optimal values, enhancing its predictive capabilities. In essence, gradient descent is the guide steering the model towards precision, making it a cornerstone in the training of machine learning models.

Cost function and its partial derivative:

The goal of any Machine Learning model is to minimize the Cost Function. Gradient Descent Update Equation helps to iterative update minimize error (cost function) between predictions and actual values:

Term Explanation:

The gradient represents the rate of change of the cost function with respect to each parameter, guiding the model toward minimizing prediction errors. Utilizing the principles of gradient descent, the iterative process involves updating model parameters in the direction opposite to the gradient. This step-wise refinement continues until the cost function reaches its minimum, signifying an optimal configuration for the model.

compute_cost Method:

The cost function for the Linear Regression Model
 def compute_cost(self):
Y_prediction = self.predict(self.X)
return (1/(2*self.m))*np.sum(Y_prediction - self.Y)**2
  • The compute_cost method calculates the cost function, which represents the difference between the predicted values and the actual target values.
  • It returns the mean squared error, a measure of how well the model is performing.

gradient_descent Method:

Approach for Gradient descent
def gradient_descent(self):
Y_prediction = self.predict(self.X)
djdw = (1/self.m)*(self.X.T).dot(Y_prediction - self.Y)
djdb = (1/self.m)*np.sum(Y_prediction - self.Y)
self.w = self.w - self.learning_rate * djdw
self.b = self.b - self.learning_rate * djdb
  • The gradient_descent method updates the model's weights (self.w) and bias (self.b) based on the gradients of the cost function.
  • Y_prediction represents the predicted values using the current model parameters.
  • The gradients are calculated using the partial derivatives of the cost function with respect to weights (djdw) and bias (djdb).
  • The weights and bias are updated using the gradient descent algorithm.

predict Method:

 def predict(self, X):
return X.dot(self.w) + self.b
  • The predict method generates predictions using the trained model parameters.
  • It computes the dot product of the input features (X) with the weights (self.w) and adds the bias (self.b).

return_weights Method:

 def return_weights(self):
return self.w, self.b
  • The return_weights method returns the final weights (self.w) and bias (self.b) after training the model.

These methods collectively define a simple linear regression model capable of training on data and making predictions. The gradient descent algorithm is utilized to optimize the model parameters for better predictive performance.

4. Gradient descent for convergence.

Now let’s try to solve a single feature practical example with our model and let’s fine-tune the model for optimal performance in various scenarios.

We should choose the ideal learning rate for our model for the Gradient descent to converge to the minimum of the cost function. Otherwise, it’s not going to end well as the below picture.

cite: Stanford cs231

To choose a better learning rate we can use the def compute_cost(self): I added to the model class. which we can observe the values of the cost function while the model is training.

The first step is let’s import our dataset. Here I’m going to train our model to predict the salary based on the feature Years of experience in a job.

The dataset and the example jupyter notebook are in this GitHub repo Link.

Import required python packages;

import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split #import a function called "train test split" with this function we can easily split our data set into two sets for training and testing

Run our Defining the LinearRegression class;

class LinearRegression(): 

# Constructor function for store the learning rate and no. of iterations
def __init__(...........................

Import the data set as pandas Data Frame object;

df = pd.read_csv('Salaries.csv') 
df.head() #To view 1st 5 items
df.shape #To vies the shape of data frame

Now let’s divide the columns into X feature vectors and y targets and then let’s do a train — test split with 20% for test data and 80% for training data;

X = df['YearsExperience'].values.reshape(-1, 1) 
y = df['Salary'].values.reshape(-1, 1)
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2) # testing data percentage =0.2 =20%

Time to define a variable to store our LinearRegression model object. here I used learning rate=0.03 and number of iterations =2000.

model = LinearRegression(learning_rate=0.03, 
no_of_itr=2000)
model.fit(X_train, y_train)

model.fit() method will train our model for the training data set X_train, y_train . Then let’s see our model’s w,b parameter values;

w,b = model.return_weights()
print('Weights for the respective features are : \n' ,w)
print('Bias value for the regression is ', b)

Let’s try a few predictions with our test data set;

Y_predict = model.predict(X_test)
print(y_test)
print(Y_predict)

As the final test let’s plot the data set with our f_wb(x)=w*x +b line to view whether our model fits accurately with the data set or not;

plt.scatter(df['YearsExperience'], df['Salary']) 
plt.xlabel('X Features')
plt.ylabel('Y Predictions and Y Targets')
plt.title('Linear Regression Implementation')
w,b = model.return_weights()
X = df['YearsExperience'].values
plt.plot(X, w[0][0] * X + b, color='red')
plt.legend(["Y Targets", "Y Predictions"], loc ="lower right")
plt.show()

Congratulations! You’ve just completed the foundations of supervised machine learning and the nuts and bolts of Linear Regression with this article. Let’s meet with the next article on logistic regression which is the basis for classification models. Happy coding!

--

--