‘Machine Learning’ course by Andrew Ng: Recoding with Python — Part3

5 min readJun 7, 2022

Image by ”https://www.freepik.com/vectors/homework">Homework vector created by svstudioart — www.freepik.com

This is the third article in this series where I try to recode the exercises in the (old) Machine Learning course by Andrew Ng (where the programming exercises are done using Octave). My intention in writing these articles is to assist the learners of this course to use Python as an alternative while doing the exercises. Please also feel free to explore Part1 and Part2.

In this Part3, we will build a logistic regression model to predict whether a student gets admitted into a university or not. Suppose we are the administrator of a department at a university and we want to determine each applicant’s chance of admission based on their results on two exams. We already have the historical data from previous applicants’ scores on these exams and the result of the admission.

1. Getting to know the data

The dataset given contains 100 records with two features (exam1_score and exam2_score) and 1 target column (admission result). As usual, I will load the data and try to visualize the data to get an idea of what it looks like.

import pandas as pd
import numpy as np
import matplotlib.pyplot as pltdata = pd.read_table('ex2data1.txt', header = None, sep=',')
data.columns =['exam1', 'exam2', 'admission']
data.head()

Below is the snippet of the dataset:

In the below code, ‘plotdata()’ function is written to draw a scatter plot of our data. From the plot, we can see that the admitted students have a total score (the sum of exam1 and exam2 scores) of more than 100. Those who have a total score lower than 100 are not admitted.

def plotdata(data):
    one_index = data.index[data['admission'] ==1].tolist()
    zero_index = data.index[data['admission'] ==0].tolist()    plt.plot(data.iloc[one_index, 0], data.iloc[one_index, 1], 'k+', linewidth = 2, markersize = 7)
    plt.plot(data.iloc[zero_index, 0], data.iloc[zero_index, 1], 'ko', markersize = 7)
    plt.xlabel('Exam 1 score')
    plt.ylabel('Exam 2 score')
    plt.legend(['Admitted', 'Not admitted'])
    plt.show()plotdata(data)

2. Sigmoid Function

The logistic regression hypothesis is defined as:

where the function g is the sigmoid function. It is defined as:

Below is the sigmoid function coded in python:

# the function 'sigmoid' can compute the sigmoid of each value of z (z can be a matrix vector or a scalar)def sigmoid(z):
    g = 1/(1 + np.exp(-z))
    return g

3. Cost Function and Gradient

The cost function in logistic regression is defined as below:

, where m is the number of training examples. And the gradient of the cost is defined as:

, for j = 0,1,2,…,n where n is the number of features.

Now, let’s define x and y from our dataset. For x, we will take out the first two columns from our data frame ‘data’ and add a column of ones for the bias term.

# getting total number of training examples
m = data.shape[0]# define x and y
y = data.iloc[:, 2].to_numpy().reshape(m,1)
x = np.concatenate((np.ones((m,1)), data.iloc[:, :2]), axis = 1)# initialize theta
theta_start = np.zeros((x.shape[1], 1))

We will define two separate functions to calculate the cost and the gradient.

# this function will compute cost of using theta as the parameter for logistic regressiondef J(theta, x, y):
    m = len(y)
    y_prime = np.transpose(y)
    x_prime = np.transpose(x)
    h_theta = sigmoid(np.dot(x,theta))
    
    J = (-np.dot(y_prime, np.log(h_theta)) -  np.dot(np.transpose(1-y), np.log(1-h_theta)))/ m
    J = np.sum(J)
    
    return J# Gradient of the costdef Gradient(theta, x, y):
    m = len(y)
    x_prime = np.transpose(x)
    
    error = sigmoid(np.dot(x,theta)) - y

    grad = (np.dot(x_prime, error))/ m
    grad = grad.flatten()
    return grad

4. Learning Parameters using Newton-Conjugate-Gradient

Instead of taking the gradient descent steps for a number of iterations, we will use the ‘fmin_ncg()’ function (an unconstrained minimization using the Newton-CG method) from scipy.optimize. Note: for logistic regression, there is no constraint on the theta values (i.e., theta can take any real value). You can read more about the function here.

The required parameters for this function are:

An objective function to be minimized. (this is our cost function J(ϴ)).
Initial guess of the parameters to be optimized i.e., our initialized ϴ values.
Gradient of J(ϴ)
maxiter: maximum number of iterations to perform

The function will return the optimized ϴ values (together with other information).

import scipy.optimize as opttheta_start = np.zeros((x.shape[1], 1))result = opt.fmin_ncg(J, x0=theta_start, fprime = Gradient, args = (x,y.flatten()), maxiter = 400)print("Cost at theta found by fmin_ncg() : {}".format(J(result,x,y)))
print('Theta : {}'.format(result))

Below is the output:

5. Plotting the Boundary Line

I will write another function to call our previous ‘plotdata()’ function and add a boundary line to the plot.

def plotDB(data, theta, x, y):
    plotdata(data)
    
    # we take out two values for x1
    x_values = [np.min(x[:, 1]), np.max(x[:, 1])]
    # use these x1 values to calculate x2 values
    y_values = - (theta[0] + np.dot(theta[1], x_values)) / theta[2]
    
    plt.plot(x_values, y_values, label='Decision Boundary')plotDB(data, result, x, y)

6. Evaluating the model

In this step, we can use our model to predict the probability of whether a particular student will be admitted or not. Note: don’t forget to add a bias term to the input X.

Furthermore, we can evaluate the quality of the parameters we have found by asking the model to predict our training set data.

# Function for predicting
# this will predict whether the label is 1 or 0 using the learned logistic regression parameters thetadef predict(theta, x):
    m = len(x)
    p = np.zeros((m,1))
    for i in range(m):
        sig = sigmoid(np.dot(x[i],theta))
        if sig > 0.5:
            p[i] = 1
        else:
            p[i] = 0
    return pp = predict(result, x)
print("Train Accuracy : {}%".format(int(sum(p==y))))

From this, our model accuracy based on the training set is 89%.

This is all for this part. The next part will be about regularized logistic regression where we will try to create a non-linear decision boundary and therefore, need to add regularization parameters to prevent overfitting.

Keep Learning. Enjoy the journey!