# Python Implementation of Andrew Ng’s Machine Learning Course (Part 1)

A few months ago I had the opportunity to complete Andrew Ng’s Machine Learning MOOC taught on Coursera. It serves as a very good introduction for anyone who wants to venture into the world of AI/ML. But the catch….this course is taught in Octave.

I always wondered how amazing this course could be if it were in Python. I finally decided to re-take the course but only this time I would be completing the programming assignments in Python.

In these series of blog posts, I plan to write about the Python version of the programming exercises used in the course. I’m doing this for a few reasons:

- It will help anyone who wanted a Python version of the course (that includes me as well)
- It will hopefully benefit R users who are willing to learn about the Pythonic implementation of the algorithms they are already familiar with

Pre-requisites

It’s highly recommended that first you watch the week 1 video lectures.

Should have basic familiarity with the Python ecosystem.

In this section, we will look at the simplest Machine Learning algorithms.

**Linear Regression with One Variable**

First some context on the problem statement.

Here we will implement linear regression with one variable to predict profits for a food truck. Suppose you are the CEO of a restaurant franchise and are considering different cities for opening a new outlet. The chain already has trucks in various cities and you have data for profits and populations from the cities.

The file ex1data1.txt (available under week 2's assignment material) contains the dataset for our linear regression exercise. The first column is the population of a city and the second column is the profit of a food truck in that city. A negative value for profit indicates a loss.

First, as with doing any machine learning task, we need to import certain libraries.

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

**Reading and Plotting the data**

Before starting on any task, it is often useful to understand the data by visualizing it. For this dataset, you can use a scatter plot to visualize the data, since it has only two properties to plot (profit and population).

*(Many other problems that you will encounter in real life are multi-dimensional and can’t be plotted on a 2-d plot. To create multidimensional plots you have to be creative in using various aesthetics like colors, shapes, depths, etc).*

data = pd.read_csv('ex1data1.txt', header = None) #read from dataset

X = data.iloc[:,0] # read first column

y = data.iloc[:,1] # read second column

m = len(y) # number of training example

data.head() # view first few rows of the data

Here we used the pandas `read_csv`

function to read the comma separated values. Also, we have used the `head`

function to view the first few rows of our data.

plt.scatter(X, y)

plt.xlabel('Population of City in 10,000s')

plt.ylabel('Profit in $10,000s')

plt.show()

**Adding the intercept term**

In the following lines, we add another dimension to our data to accommodate the intercept term (the reason for doing this is explained in the videos). We also initialize the initial parameters `theta`

to `0`

and the learning rate `alpha`

to `0.01`

.

X = X[:,np.newaxis]

y = y[:,np.newaxis]

theta = np.zeros([2,1])

iterations = 1500

alpha = 0.01

ones = np.ones((m,1))

X = np.hstack((ones, X)) # adding the intercept term

**Note on np.newaxis:** When you read data into X, y you will observe that X, y are rank 1 arrays. rank 1 array will have a shape of (m, ) where as rank 2 arrays will have a shape of (m,1). When operating on arrays its good to convert rank 1 arrays to rank 2 arrays because rank 1 arrays often give unexpected results.

To convert rank 1 to rank 2 array we use `someArray[:,np.newaxis]`

.

Next we will be computing the cost and the gradient descent. The way to do this is very well explained by Andrew Ng in the video lectures. I am only providing the Python codes for the pseudo code which Andrew Ng uses in the lectures.

**Computing the cost**

def computeCost(X, y, theta):

temp = np.dot(X, theta) - y

return np.sum(np.power(temp, 2)) / (2*m)

J = computeCost(X, y, theta)

print(J)

You should expect to see a cost of `32.07`

.

**Finding the optimal parameters using Gradient Descent**

def gradientDescent(X, y, theta, alpha, iterations):

for _ in range(iterations):

temp = np.dot(X, theta) - y

temp = np.dot(X.T, temp)

theta = theta - (alpha/m) * temp

return theta

theta = gradientDescent(X, y, theta, alpha, iterations)

print(theta)

Expected `theta`

values `[-3.6303, 1.1664]`

We now have the optimized value of `theta`

. Use this value in the above cost function.

J = computeCost(X, y, theta)

print(J)

It should give you a value of `4.483 `

which is much better than `32.07`

**Plot showing the best fit line**

plt.scatter(X[:,1], y)

plt.xlabel('Population of City in 10,000s')

plt.ylabel('Profit in $10,000s')

plt.plot(X[:,1], np.dot(X, theta))

plt.show()

Lets extend the idea of linear regression to work with multiple independent variables.

**Linear Regression with multiple variables**

In this section, we will implement linear regression with multiple variables (also called Multivariate Linear Regression).

Problem context:

Suppose you are selling your house and you want to know what a good market price would be. One way to do this is to first collect information on recent houses sold and make a model of housing prices. Your job is to predict housing prices based on other variables.

The file ex1data2.txt((available under week 2’s assignment material)) contains a training set of housing prices in Portland, Oregon. The first column is the size of the house (in square feet), the second column is the number of bedrooms, and the third column is the price of the house.

You already have the necessary infrastructure which we built in our previous section that can be easily applied to this section as well. Here we will just use the equations which we made in the above section.

import numpy as np

import pandas as pd

data = pd.read_csv('ex1data2.txt', sep = ',', header = None)

X = data.iloc[:,0:2] # read first two columns into X

y = data.iloc[:,2] # read the third column into y

m = len(y) # no. of training samples

data.head()

As can be seen above we are dealing with more than one independent variables here (but the concepts you have learnt in the previous section applies here as well).

**Feature Normalization**

By looking at the values, note that house sizes are about 1000 times the number of bedrooms. When features differ by orders of magnitude, first performing feature scaling can make gradient descent converge much more quickly.

Our task here is to:

- Subtract the mean value of each feature from the dataset.
- After subtracting the mean, additionally scale (divide) the feature values by their respective “standard deviations.”

X = (X - np.mean(X))/np.std(X)

**Adding the intercept term and initializing parameters**

(the below code is similar to what we did in the previous section)

ones = np.ones((m,1))

X = np.hstack((ones, X))

alpha = 0.01

num_iters = 400

theta = np.zeros((3,1))

y = y[:,np.newaxis]

**Computing the cost**

def computeCostMulti(X, y, theta):

temp = np.dot(X, theta) - y

return np.sum(np.power(temp, 2)) / (2*m)

J = computeCostMulti(X, y, theta)

print(J)

You should expect to see a cost of `65591548106.45744`

.

**Finding the optimal parameters using Gradient Descent**

def gradientDescentMulti(X, y, theta, alpha, iterations):

m = len(y)

for _ in range(iterations):

temp = np.dot(X, theta) - y

temp = np.dot(X.T, temp)

theta = theta - (alpha/m) * temp

return theta

theta = gradientDescentMulti(X, y, theta, alpha, num_iters)

print(theta)

your optimal parameters will be `[[334302.06399328],[ 99411.44947359], [3267.01285407]]`

We now have the optimized value of `theta`

. Use this value in the above cost function.

J = computeCostMulti(X, y, theta)

print(J)

This should give you a value of `2105448288.6292474`

which is much better than `65591548106.45744`

You now have learnt how to perform Linear Regression with one or more independent variables. Well done!

That’s it for this post. Give me a clap (or several claps) if you liked my work.

You can find other articles in this series here

- Logistic Regression (part 2.1)
- Regularized Logistic Regression (part 2.2)