Uni-Variate Linear Regression

Published in

CodeX

7 min readSep 1, 2021

Hello there, everyone. Today we are stepping into our first Supervised Learning algorithm. Linear Regression is the name of the game. In this post, we will code our algorithm and then use it as our prediction model.

What is Linear Regression?

Linear regression is a statistical supervised learning technique that uses one or more independent features to establish a linear relationship to predict a dependent variable. It’s also known as a “line of best fit”. The basic idea of Linear regression is to find a straight line that fits the set of data points.

Model Representation

Notations:

The process of regression, in general, is explained by the following flow chart:

To generate a hypothesis(a line), the training data is seeded into the learning process. A hypothesis is a function that uses input data to predict output values. Hypothesis (h) is a function that maps x’s to y’s.

In the case of Linear regression the Hypothesis (h) is represented by the following equation:

Pictorial representation for parameters:

Pictorial representation of hypothesis:

From the above picture, we can clearly witness that there is a linear trend. With an increase in ‘x’ value, ‘y’ value also increases.

Here, x is the independent variable, y is the output variable, θ₀ is the intercept which is constant and θ₁ is slope coefficient.

Consider a dataset including information on a food truck’s profits based on the population of a city.

Lets import the required libraries:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

And load the dataset into a Pandas dataframe

data = pd.read_csv('ex1data1.txt', header=None)
data.head()

Here 0( i.e. x, Population of a city in 10,000’s is input variable) and 1(i.e. y, Profit in $10,000 is the output variable)

data.describe()

Before starting on any task, it is often useful to understand data by visualizing it. For this dataset, we will use a scatter plot to visualize data, since it has only two properties to plot(Profit and Population). Many other problems that we encounter in real-life are multi-dimensional and cannot be plotted on a 2-d plot.

plt.scatter(data[0], data[1], marker="x", c='red', alpha=0.5)
plt.xticks(np.arange(5, 30, step=5))
plt.yticks(np.arange(-5, 30, step=5))
plt.xlabel("Population of a city(10,000s)")
plt.ylabel("Profit($10,000)")
plt.title("Profit vs. Population")

From this graph we can see the linear trend, as population increases profit also increases.

Because we are using Linear Regression, the hypothesis is similar to the equation of a straight line. To fit the data, we can also use any sort of function as the hypothesis( h(x) ).

How to Choose Good Parameters for the Hypothesis?

The goal is to set the parameters so that h(x) is close to the values of y for each x. For example, choose θ₀ and θ₁ so that h(x) is close to the values of y for each x.
This condition can be expressed numerically as follows:

This expression is called as Cost function, Loss function, etc… The result of this expression is called as Cost, Loss, etc… At the end of the day, everything is same. There are many types of cost functions available. The mean square error (MSE) cost function is the one usually used in regression problems, which is shown above.

Pictorial representation of Good vs Bad Hypothesis:

From the above graph we can infer that, the distance between points and brown line are far from each other, the distance between points and green line are closer to each other. So, the green line will have minimum cost(good hypothesis) and the brown line will have maximum cost(bad hypothesis). The above equation will try to find the line that reduces the loss of h(x).

Lets further analyze the cost function:

def compute_cost(X, y, theta):
    m = len(y)
    h_theta = X.dot(theta)
    J = 1/(2*m) * np.sum((h_theta-y)**2)
    return J

mod_data = data.values
m = len(mod_data[:,-1])
X = np.column_stack((np.ones((m, 1)), mod_data[:, 0].reshape(m, 1)))
y = mod_data[:, 1].reshape(m, 1)
theta = np.zeros((2, 1))

iterations = 1500
alpha = 0.01
J = compute_cost(X, y, theta)

Gradient Descent to Minimize the Cost Function

Gradient descent is one of the popular algorithm to minimize the loss of the cost function.

Have some function J(θ₀, θ₁)
Goal:

Start with some θ₀, θ₁.
Keep changing θ₀, θ₁ to reduce J(θ₀, θ₁) until we hopefully end up at minimum.

Updating Parameters Rule:

Find hypothesis → h(x) = θ₀ + θ₁x
Then use hypothesis and output variable in data to find Cost :

Then update theta as follows:

Alpha or learning rate decides the size of steps that the algorithm takes to reach the minimum cost value(i.e., the values of parameters which give the minimum cost value). The very important detail that we have to notice is that we have to update θ₁ and θ₀ simultaneously. We shouldn’t be updating θ₀, then upgrade the cost function and then θ₁, that doesn’t work it the way we want.

def gradientDescent(X, y, theta, alpha, n_iters, graph=True):
    m = len(y)
    J_history = []
    
    for i in range(n_iters):
        h_theta = X.dot(theta)
        err = np.dot(X.T, (h_theta - y))
        descent = alpha * 1/m * err
        theta -= descent
        
        J_history.append(compute_cost(X, y, theta))
    if graph:
        plt.plot(J_history)
        plt.xlabel("No. of Iterations")
        plt.ylabel("J(theta)")
        plt.title("Cost function using Gradient Descent")
        
    return theta, J_historytheta, J_history = gradientDescent(X, y, theta, alpha, iterations)
print(f"h(x)= {round(theta[0, 0], 2)} + {round(theta[1, 0], 2)} x1")

The initial values of parameters are set to be zero. The learning rate is set as 0.01. A maximum of 1400 repetitions or epochs is allowed. Plotting the cost function against the number of iterations gave a nice descending trend, indicating that the gradient descent implementation works in reducing the cost function.

Now with that optimized Θ values, I will plot the graph together with the predicted values (the line of best fit)

Predictions

To make predictions:

def predict(X, theta):
    predictions = np.dot(theta.T, X)
    return predictions[0]

predict1 = predict(np.array([1,3.5]),theta)*10000
print(f'For population=35000, we predict a profit of ${round(predict1, 0)}')

predict2 = predict(np.array([1,7]),theta)*10000
print(f'For population=70000, we predict a profit of ${round(predict2, 0)}')

Conclusion

Today, we saw the concepts behind hypothesis, cost function, and gradient descent of uni-variate linear regression and built a regression line on the housing prices dataset’s features using the above concepts. It was then created from scratch using python’s numpy, pandas and matplotlib. The dataset and final code is uploaded in github.

Check it out here Linear Regression.

If you like this post, then check out my other posts in this series about

1. What is Machine Learning?

2. What are the Types of Machine Learning?

3. Multi-Variate Linear Regression

4. Logistic Regression

5. What are Neural Networks?

6. Digit Classifier using Neural Networks

7. Image Compressing with K-means Clustering

8. Dimensionality Reduction on Face using PCA

9. Detect Failing Servers on a Network using Anomaly Detection

Last Thing

If you enjoyed my article, a clap 👏 and a follow would be astonishing and it is helpful for medium to promote this article so that others may read it. I am Jagajith and I will catch you in the next one.