Understanding Linear Regression Algorithm with Practical Example

5 min readJul 12, 2024

Imagine trying to use machine learning algorithms without understanding how they work. They aren’t black boxes or magical solutions. By delving into the mechanics of these algorithms, you can unlock their full potential and apply them more effectively.

In this blog, we will first look at the mathematical jargons then we will use these jargons to train a linear regression model without using any library like scikit-learn.

Let’s start

Linear Regression-
A Linear model makes prediction by simply computing a weighted sum of the input features, plus constant called bias term (also called the intercept term).

This Equation can be written in vectorized form

Now we know how we calculate our predictions. But how do we train it?
Training a model means setting its parameters so that the model best fits the training set.
For measuring how well or poorly the model fits the training data we use performance measures like root mean square error (RMSE), mean square error (MSE), Mean Absolute Error (MAE), etc.
We need to find the value of θ that minimizes the performance measure, i.e., RMSE, MSE.

m is the number of instances
x (i) is a vector of all the feature values (excluding the label) of the ith instance in the dataset, and y (i) is its label (the desired output value for that instance).
X is a matrix containing all the feature values (excluding labels) of all instances in the dataset. There is one row per instance and the ith row is equal to the transpose of x (i), noted (x (i)) Transpose.
h is prediction function (hypothesis).
MSE(X, h) is the cost function.

To find the value of θ that minimizes the MSE, we use the Normal equation

Time to implement these mathematical equations at work.

We will utilize a dataset to predict insurance charges based on various features. These features include age, sex, smoking status, region of residence, BMI (Body Mass Index), and the number of children. By analyzing these variables, we aim to develop a predictive model for insurance charges.

import numpy as np
import pandas as pd
np.random.seed(42)
df = pd.read_csv("insurance.csv") # load data
df.head()

df.info()

df.describe()

Let’s deal with the non-numerical values

df['sex'] = df['sex'].map({'male' : 1,
                          'female' : 0})

df['smoker'] = df['smoker'].map({'yes' : 1,
                                'no' : 0})

def onehotencoder(region):
    # Initialize a dictionary for one-hot encoding
    encoding = {
        'southeast': [1, 0, 0, 0],
        'southwest': [0, 1, 0, 0],
        'northwest': [0, 0, 1, 0],
        'northeast': [0, 0, 0, 1]
    }
    return encoding.get(region, [0, 0, 0, 0])  # Default to all zeros if region not found

df[['southeast', 'southwest', 'northwest', 'northeast']] = df['region'].apply(onehotencoder).apply(pd.Series)
df.drop('region', axis=1, inplace=True) # useless now

Now our data look is almost ready, we just need to make it normalized

# Normalizing the data using
def normalize(data):
    numeric_data = data.select_dtypes(include=[np.number])
    mu = numeric_data.mean()
    sigma = numeric_data.std()
    normalized_data = (numeric_data - mu) / sigma
    return normalized_data
norm_df = normalize(df)
norm_df.head()

def train_test_split(data, test_size=0.33, random_seed=None):
    if random_seed is not None:
        np.random.seed(random_seed)
    
    # Shuffle the data
    shuffled_indices = np.random.permutation(len(data))
    
    # Split the indices
    test_set_size = int(len(data) * test_size)
    test_indices = shuffled_indices[:test_set_size]
    train_indices = shuffled_indices[test_set_size:]
    
    # Create train and test sets
    train_set = data.iloc[train_indices]
    test_set = data.iloc[test_indices]
    
    return train_set, test_set
train, test = train_test_split(df, test_size=0.20, random_seed=42)
train_X, train_y = train.drop('charges', axis=1).copy(), train['charges'].copy()
test_X, test_y = test.drop('charges', axis=1).copy(), test['charges'].copy()

Now time to use Normal Equation to construct a linear regression model

class LinearRegression:
    def __init__(self):
        self.theta = 0
    
    def fit(self, X, y):
        self.theta = np.linalg.inv(X.T @ X) @ X.T @ y # normal equation
        return self
    
    def coef_(self):
        return self.theta
    
    def predict(self, X):
        X = X.values
        return X @ self.theta

# Train model
reg = LinearRegression()
reg.fit(train_X, train_y)

# Evaluate the model
Predictions = reg.predict(test_X)
mse = np.mean((Predictions - test_y) ** 2)
rmse = np.sqrt(mse)
print(f"RMSE : {rmse} MSE : {mse}")

Now let’s see how Scikit-Learn’s Linear Regression model performs.

from sklearn.linear_model import LinearRegression
reg = LinearRegression()
reg.fit(train_X, train_y)
pred_y = reg.predict(test_X)
mse = np.mean((pred_y-test_y) ** 2)
rmse = np.sqrt(mse)

The results are pretty similar. I hope this short blog helped you. I’m open to suggestions and corrections. Thank you!

You can find the code and dataset here GitHub

Understanding Linear Regression Algorithm with Practical Example

Let’s start

Time to implement these mathematical equations at work.

Written by Jatin Mehra