Andrew Ng’s Linear Regression Exercise — A Python Solution

5 min readFeb 23, 2020

Linear Regression in One Variable

This is a python implementation of the Linear Regression exercise in week 2 of Coursera’s online Machine Learning course, taught by Dr. Andrew Ng. We are provided with data for population of cities and profit made in each city by a restaurant. The task is to use linear regression to determine new cities the restaurant should expand to. Our goal is to determine how population affects profit.

NOTE: The python code below is not the most optimized code for this problem. I tried to implement the solution as close as possible to the Matlab/Octave code used in the course, so anyone trying to convert from Matlab/Octave can easily follow.

Let’s start by importing relevant python libraries.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Let’s import the data into a Pandas dataframe called data. There are 2 columns in the data set, Population and Profit.

data = pd.read_csv('ex1data1.txt', names=['Population', 'Profit'])

Let’s view the first 5 records in the data set.

data.head()# Viewing the first 5 records in the data set.

Next, we will go ahead and visualize the relationship between Population and Profit. We will use matplotlib to create a scatter plot of population against profit.

X = data['Population'].values              #Assign 'Population' to X
y = data['Profit'].values                  #Assign 'Profit' to y
m = len(y)                   #This is the length of the training set
plt.scatter(X,y, c='red', marker='x')         #Plot scatter plot
plt.ylabel('Profit in 10,000s')         #Label on the y axis
plt.xlabel('Population of City in 10,000s')     #Label on the X axis
plt.title('Scatter Plot of Training Data')      #Title for the plot

#The plot is shown below:

Scatter plot of Population against Profit.

This graph shows that population is positively correlated with profit.

Compute the cost.

Before we run the gradient descent algorithm to minimize the cost (error of prediction)let’s determine the current cost.

X = np.append(np.ones([m,1]), X.reshape(m,1), axis=1)#Append the bias term to X and reshape X to mx1 matrixy = y.reshape(m,1)                 #Reshape y to mx1 matrixtheta = np.zeros([2,1])            #Set initial coefficient to zero

def computeCost(X,y, theta):
    '''
    This function takes in the X and y matrices as well as the 
    initial theta values(coefficients) and returns the cost(error of             
    prediction)
    '''
    m = len(y)              #This is the length of the training set
    h = X.dot(theta)        #The hypothesis
    J = 1/(2*m)*(np.sum((h-y)**2))             #The cost function
    return J                #return the cost

cost = computeCost(X,y, theta)  #Call the function and pass in values for X, y, and theta to compute the cost.print(cost)                   #Print out the cost32.072733877455676

For the current cost we get a value of 32.072733877455676. We will use gradient descent to attempt to reduce this value to the barest minimum.

Gradient Descent

Now that we know the cost, we will use gradient descent to minimize it. We will use alpha to represent the learning rate and run the algorithm for 1500 iterations.

iter = 1500 ## Number of iterations.
alpha = 0.01 ## Learning rate

def gradientDescent(X, y, theta, alpha, iter):
    '''
    This function takes in the X and y matrices, the initial theta
    values(coefficients), the learning rate, and the number of 
    iterations. The output will be the a new set of coefficient of 
    the linear regression (theta), optimized for making predictions.
    '''   
    
    J_history = []        #Array for storing the cost values for 
                           each iteration
    m = len(y)            #This is the length of the training set
    
    for i in range(iter):             #Loop for 1500 iterations
        h = X.dot(theta) # The hypothesis
        theta = theta - (alpha/m)*(X.T.dot(h-y))   #Gradient descent
                                                   #function        J_history.append(computeCost(X, y, theta))  #Append the cost 
                                            #to the J_history array    return theta, J_history     #return the final values of theta 
                                #and the J_history array

Now we call the function and pass in values for X, y, theta, alpha, and iter, to compute new coefficient values.

new_theta, J_history = gradientDescent(X, y, theta, alpha, iter)
print(new_theta)[[-3.63029144]
 [ 1.16636235]]

Our new coefficients are now -3.63029144 and 1.16636235. Now let’s use these theta values to recompute the cost and see how much cost reduction we achieved.

new_cost = computeCost(X,y, new_theta)
print(new_cost)4.483388256587725

Using the new values of the coefficients to compute the cost, we see a huge improvement. From 32.07 to 4.48.

Now let’s plot the linear regression on the training data. Just like we did earlier, we first create a scatter plot of population against profit. Then we plot the linear regression. To plot the regression line, we use the 2nd column of X, which is index 1 (python is zero indexed), and plot it against the hypothesis. Our hypothesis is h = X.dot(theta) or np.dot(X, theta). Both will produce the same result, as long as we remember to use the new values of theta (new_theta) obtained from the gradient descent function.

plt.scatter(X[:,1].reshape([m,1]),y, c='red', marker='x', label='Training Data')
plt.plot(X[:,1].reshape([m,1]), np.dot(X, new_theta), label='Linear Regression')
plt.ylabel('Profit in 10,000s')
plt.xlabel('Population of City in 10,000s')
plt.legend()
plt.title('Training Data with Linear Regression Fit')

Fitting the linear regression on the training data.

The regression line shows a reasonably good fit to the data.

Predictions.

Now we are provided with 2 values representing population figures from 2 cities. Our task is to predict the profit from each city. It doesn’t matter how many cities we have values for, our function should be able to predict profit either for a single city of for a group of cities. In other words, our function should be able to take in a single vector containing population for one city as well as an array containing population for several cities. In this case though, we take in value for one city at a time and print out the prediction.

#Now we define a function for predicting profit giving population.

First_City = 3500
Second_City = 70000

def prediction(X, new_theta):
    '''
    This function takes in population and predicts profit.
    '''
    pred = np.dot(X,new_theta)* 10000  #We use the new coefficients.
    return pred                        #The prediction

predict1 = (prediction(([1, 3.5]),(new_theta)))
predict2 = (prediction(([1, 7]),(new_theta)))

print(f'For a population of {First_City} people, profit will be {predict1[0]} ')              #Print prediction1print(f'For a population of {Second_City} people, profit will be {predict2[0]}')               #Print prediction2For a population of 3500 people, profit will be 4519.7678677017675 
For a population of 70000 people, profit will be 45342.45012944714

With what we have done so far, we can now estimate how much profit we will generate from any city once we know the population of the city. So we can decide which cities to prioritize in expanding the restaurant.

An ipython notebook can be found on github here

You can check out other articles in the series by visiting the links below:

Linear regression with multiple variables (Part 1)

Linear regression with multiple variables (part 2) — using normal equation

Andrew Ng’s Linear Regression Exercise — A Python Solution

Linear Regression in One Variable

Written by Cheche