Python Implementation of Andrew Ng’s Machine Learning Course (Part 1)

Published in

Analytics Vidhya

6 min readAug 31, 2018

A few months ago I had the opportunity to complete Andrew Ng’s Machine Learning MOOC taught on Coursera. It serves as a very good introduction for anyone who wants to venture into the world of AI/ML. But the catch….this course is taught in Octave.

I always wondered how amazing this course could be if it were in Python. I finally decided to re-take the course but only this time I would be completing the programming assignments in Python.

In these series of blog posts, I plan to write about the Python version of the programming exercises used in the course. I’m doing this for a few reasons:

It will help anyone who wanted a Python version of the course (that includes me as well)
It will hopefully benefit R users who are willing to learn about the Pythonic implementation of the algorithms they are already familiar with

Pre-requisites
It’s highly recommended that first you watch the week 1 video lectures.
Should have basic familiarity with the Python ecosystem.

In this section, we will look at the simplest Machine Learning algorithms.

Linear Regression with One Variable

First some context on the problem statement.
Here we will implement linear regression with one variable to predict profits for a food truck. Suppose you are the CEO of a restaurant franchise and are considering different cities for opening a new outlet. The chain already has trucks in various cities and you have data for profits and populations from the cities.

The file ex1data1.txt (available under week 2's assignment material) contains the dataset for our linear regression exercise. The first column is the population of a city and the second column is the profit of a food truck in that city. A negative value for profit indicates a loss.

First, as with doing any machine learning task, we need to import certain libraries.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Reading and Plotting the data

Before starting on any task, it is often useful to understand the data by visualizing it. For this dataset, you can use a scatter plot to visualize the data, since it has only two properties to plot (profit and population).

(Many other problems that you will encounter in real life are multi-dimensional and can’t be plotted on a 2-d plot. To create multidimensional plots you have to be creative in using various aesthetics like colors, shapes, depths, etc).

data = pd.read_csv('ex1data1.txt', header = None) #read from dataset
X = data.iloc[:,0] # read first column
y = data.iloc[:,1] # read second column
m = len(y) # number of training example
data.head() # view first few rows of the data

Here we used the pandas read_csv function to read the comma separated values. Also, we have used the head function to view the first few rows of our data.

plt.scatter(X, y)
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')
plt.show()

Adding the intercept term

In the following lines, we add another dimension to our data to accommodate the intercept term (the reason for doing this is explained in the videos). We also initialize the initial parameters theta to 0 and the learning rate alpha to 0.01.

X = X[:,np.newaxis]
y = y[:,np.newaxis]
theta = np.zeros([2,1])
iterations = 1500
alpha = 0.01
ones = np.ones((m,1))
X = np.hstack((ones, X)) # adding the intercept term

Note on np.newaxis: When you read data into X, y you will observe that X, y are rank 1 arrays. rank 1 array will have a shape of (m, ) where as rank 2 arrays will have a shape of (m,1). When operating on arrays its good to convert rank 1 arrays to rank 2 arrays because rank 1 arrays often give unexpected results.
To convert rank 1 to rank 2 array we use someArray[:,np.newaxis].

Next we will be computing the cost and the gradient descent. The way to do this is very well explained by Andrew Ng in the video lectures. I am only providing the Python codes for the pseudo code which Andrew Ng uses in the lectures.

Computing the cost

def computeCost(X, y, theta):
    temp = np.dot(X, theta) - y
    return np.sum(np.power(temp, 2)) / (2*m)J = computeCost(X, y, theta)
print(J)

You should expect to see a cost of 32.07.

Finding the optimal parameters using Gradient Descent

def gradientDescent(X, y, theta, alpha, iterations):
    for _ in range(iterations):
        temp = np.dot(X, theta) - y
        temp = np.dot(X.T, temp)
        theta = theta - (alpha/m) * temp
    return thetatheta = gradientDescent(X, y, theta, alpha, iterations)
print(theta)

Expected theta values [-3.6303, 1.1664]

We now have the optimized value of theta . Use this value in the above cost function.

J = computeCost(X, y, theta)
print(J)

It should give you a value of 4.483 which is much better than 32.07

Plot showing the best fit line

plt.scatter(X[:,1], y)
plt.xlabel('Population of City in 10,000s')
plt.ylabel('Profit in $10,000s')
plt.plot(X[:,1], np.dot(X, theta))
plt.show()

Lets extend the idea of linear regression to work with multiple independent variables.

Linear Regression with multiple variables

In this section, we will implement linear regression with multiple variables (also called Multivariate Linear Regression).

Problem context:
Suppose you are selling your house and you want to know what a good market price would be. One way to do this is to first collect information on recent houses sold and make a model of housing prices. Your job is to predict housing prices based on other variables.

The file ex1data2.txt((available under week 2’s assignment material)) contains a training set of housing prices in Portland, Oregon. The first column is the size of the house (in square feet), the second column is the number of bedrooms, and the third column is the price of the house.

You already have the necessary infrastructure which we built in our previous section that can be easily applied to this section as well. Here we will just use the equations which we made in the above section.

import numpy as np
import pandas as pddata = pd.read_csv('ex1data2.txt', sep = ',', header = None)
X = data.iloc[:,0:2] # read first two columns into X
y = data.iloc[:,2] # read the third column into y
m = len(y) # no. of training samples
data.head()

As can be seen above we are dealing with more than one independent variables here (but the concepts you have learnt in the previous section applies here as well).

Feature Normalization

By looking at the values, note that house sizes are about 1000 times the number of bedrooms. When features differ by orders of magnitude, first performing feature scaling can make gradient descent converge much more quickly.

Our task here is to:

Subtract the mean value of each feature from the dataset.
After subtracting the mean, additionally scale (divide) the feature values by their respective “standard deviations.”

X = (X - np.mean(X))/np.std(X)

Adding the intercept term and initializing parameters

(the below code is similar to what we did in the previous section)

ones = np.ones((m,1))
X = np.hstack((ones, X))
alpha = 0.01
num_iters = 400
theta = np.zeros((3,1))
y = y[:,np.newaxis]

Computing the cost

def computeCostMulti(X, y, theta):
    temp = np.dot(X, theta) - y
    return np.sum(np.power(temp, 2)) / (2*m)
J = computeCostMulti(X, y, theta)
print(J)

You should expect to see a cost of 65591548106.45744.

Finding the optimal parameters using Gradient Descent

def gradientDescentMulti(X, y, theta, alpha, iterations):
    m = len(y)
    for _ in range(iterations):
        temp = np.dot(X, theta) - y
        temp = np.dot(X.T, temp)
        theta = theta - (alpha/m) * temp
    return theta
theta = gradientDescentMulti(X, y, theta, alpha, num_iters)
print(theta)

your optimal parameters will be [[334302.06399328],[ 99411.44947359], [3267.01285407]]

We now have the optimized value of theta . Use this value in the above cost function.

J = computeCostMulti(X, y, theta)
print(J)

This should give you a value of 2105448288.6292474 which is much better than 65591548106.45744

You now have learnt how to perform Linear Regression with one or more independent variables. Well done!

That’s it for this post. Give me a clap (or several claps) if you liked my work.

You can find other articles in this series here

Logistic Regression (part 2.1)
Regularized Logistic Regression (part 2.2)

Python Implementation of Andrew Ng’s Machine Learning Course (Part 1)

Linear Regression with One Variable

Written by Srikar