‘Machine Learning’ course by Andrew Ng : Recoding with Python — Part1

su_sandy
5 min readMay 17, 2022

--

image by ”https://www.freepik.com/vectors/work-illustration" Work illustration vector created by macrovector — www.freepik.com

I have taken this amazing “Machine Learning” course by Andrew Ng offered by Standford on Coursera. This course was created about 10 years ago. The machine learning exercises in this course were designed to use Octave programming language (in Matlab).

After finishing the course, I wonder why not redo these exercises with Python (which is now among the top 3 programming languages). So, here in this series of articles, I will use Python to redo the programming exercises from this course. I intend to write this article so that it will be helpful for the learners of this course to use Python as an alternative while doing the exercises.

The first exercise is to train a Linear Regression model using one feature. The given dataset consists of two columns (one is the population of a city and another one is the profit of a food truck in that city). We have total 194 records. The purpose is to build a model that will help the CEO of the food truck company select which city to expand to next. Below is the step-by-step guide to it.

Step1: Plotting the data

Numpy, and pandas libraries are used to read the data, take out the feature column and target column separately and change both into arrays (for calculation purpose later). Then, matplotlib is used to draw a scatter plot to get an idea of the correlation between the population and the profit.

# importing the file
import pandas as pd
import numpy as np
data = pd.read_table('ex1data1.txt', header = None, sep=',')
data.reset_index(drop=True, inplace = True)
data.head()

Below is a glimpse of the data.

population = data.iloc[:,0]      # population column
profit = data.iloc[:,1] # profit column
m = len(population) # number of training examples
x_ = population.to_numpy().reshape(m,1)
y = profit.to_numpy().reshape(m,1)
# plotting x and y
import matplotlib.pyplot as plt
plt.scatter(x_,y, marker ='x', color ='red', s=10)
plt.ylabel('Profit in $10,000')
plt.xlabel("Population of the City in 10,000")

The resulting graph is as below. We can see that there is a strong, positive correlation between the population and the profit.

Step2: Gradient Descent Function

Our hypothesis h(ϴ) is given by the linear model :

We will use the batch gradient descent to adjust ϴ values to minimize the cost function J(ϴ). Each iteration will perform the update to our initial ϴ values and find the global minimum point where we will have the least error (or cost). The equation is:

First thing is to add a bias term to our array x_ and set some initial values:

# adding a column of 1 to array x for theta0.
x = np.concatenate((np.ones((m,1)), x_), axis = 1)
# initialize the first fitting parameter: both 0
theta_start = np.zeros((2,1))
iteration = 1500# learning rate
alpha = 0.01

Secondly, a function will be built to calculate J(ϴ). This function will take in x, y and ϴ arrays and return the cost of using the input ϴ as the parameters for linear regression.

def J(x,y,theta):
m = len(y)
x_prime = np.transpose(x)
theta_prime = np.transpose(theta)
y_prime = np.transpose(y)

value = np.sum(np.power(np.dot(theta_prime, x_prime) - y_prime,2))
J = value/ (2 * m)
return J

Once we have a function for J(ϴ), we can build a function for Gradient Descent which will take in values of ϴ, x, y, learning rate α and the number of iterations and update ϴ by taking small steps. When it’s called, this function will print out each value of J and ϴ for each iteration (this is purely for checking purpose to make sure that J decreases over each iteration and can be omitted) and return the final ϴ values.

def gradient_des(x,y, theta, alpha, num_iters):
m = len(y)
J_history = np.zeros((num_iters,1)) # a 2D 0 array
x_prime = np.transpose(x)
y_prime = np.transpose(y)

for i in range(num_iters):
# perform a single gradient step on the parameter vector theta
theta_p = np.transpose(theta)

term1 = np.dot(theta_p, x_prime) - y_prime
term2 = np.dot(term1, x)
term3 = np.transpose(term2)
theta = theta - (alpha/m) * term3

# record each cost for each iteration
J_history[i] = J(x,y,theta)
print("Theta = {}, J = {}, for iteration {}".format(theta, J_history[i],i))
return theta

Now is time to test our functions and see the results.

# let's run Gradient Descent nowtheta = np.array(gradient_des(x,y,theta_start,alpha,iteration))
print("Theta computed from gradient descent: {}, {}".format(theta[0], theta[1]))

Here is a glimpse of the output and the final values of ϴ:

Step3: Plotting the Linear Regression Line

On the previous scatter plot, we can add a linear regression calculated from the dot product of our feature matrix x and ϴ.

plt.scatter(x_,y, marker ='x', color ='red', s=10)
plt.ylabel('Profit in $10,000')
plt.xlabel("Population of the City in 10,000")
plt.plot(x[:,1], np.dot(x,theta), '-')plt.legend(['Training Data', 'Linear Regression'])

Step4: Visualizing J(ϴ) [optional]

Let’s draw a surface plot and contour plots of J(ϴ) over a 2D grid of ϴ0 and ϴ1 values using matplotlib. This will help us get a better idea of the cost function J(ϴ) and the global minimum point.

Surface Plot:

import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.ticker import LinearLocator
import numpy as np
fig, ax = plt.subplots(subplot_kw={"projection": "3d"})# values of theta over which we will calculate J
theta0 = np.linspace(-10,10,100)
theta1 = np.linspace(-1,4,100)
X, Y = np.meshgrid(theta0, theta1)Z = np.zeros((len(theta0), len(theta1)))
for i in range(len(theta0)):
for j in range(len(theta1)):
t = [theta0[i], theta1[j]]
Z[i,j] = J(x,y,t)
ax.plot_surface(X, Y, Z, cmap=cm.coolwarm,linewidth=0, antialiased=False)
plt.xlabel('theta0')
plt.ylabel('theta1')

Contour Plot:

X, Y = np.meshgrid(theta0, theta1)
plt.contour(X,Y,np.transpose(Z), np.logspace(-2,3,20))
plt.xlabel('theta_0')
plt.ylabel('theta_1')
plt.plot(theta[0], theta[1], 'rx', markersize= 10, linewidth= 2)
Here, the red cross is our final theta value.

Yes, these are all the steps in getting a linear regression model for data with only one feature!

Part2 will be on building a linear regression model (using Gradient Descent and Normal Equation as an alternative) for data with more than one feature.

Keep Learning. Enjoy the journey!

--

--