Python Implementation of Andrew Ng’s Machine Learning Course (Part 2.1)

Srikar
Analytics Vidhya
Published in
6 min readSep 4, 2018

In my previous post we had discussed about Pythonic implementation of Linear Regression with Single and Multiple independent variables as part of week 1 and week 2 programming assignment. Now we will move to week 3 content i.e., Logistic Regression.

Now since this is going to be a pretty lengthy post I am going to divide this post into two parts. Watch out for Part 2.2 that looks into how to combat overfitting problem.

If you are new here I would encourage you to read my previous post

Python Implementation of Andrew Ng’s Machine Learning Course (Part 1)

Pre-requisites

It’s highly recommended that first you watch the week 3 video lectures.

Should have basic familiarity with the Python ecosystem.

Here we will look into one of the most widely used ML algorithm in the industry.

Logistic Regression

In this part of the exercise, you will build a logistic regression model to predict whether a student gets admitted into a university.

Problem context

Suppose that you are the administrator of a university department and you want to determine each applicant’s chance of admission based on their results on two exams. You have historical data from previous applicants that you can use as a training set for logistic regression. For each training example, you have the applicant’s scores on two exams and the admissions decision.

Your task is to build a classification model that estimates an applicant’s probability of admission based on the scores from those two exams.

First let’s load the necessary libraries.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.optimize as opt # more on this later

Next, we read the data (the necessary data is available under week-3 content)

data = pd.read_csv('ex2data1.txt', header = None)
X = data.iloc[:,:-1]
y = data.iloc[:,2]
data.head()

So we have two independent features and one dependent variable. Here 0 means candidate was unable to get an admission and 1 vice-versa.

Visualizing the data

Before starting to implement any learning algorithm, it is always good to visualize the data if possible.

mask = y == 1
adm = plt.scatter(X[mask][0].values, X[mask][1].values)
not_adm = plt.scatter(X[~mask][0].values, X[~mask][1].values)
plt.xlabel('Exam 1 score')
plt.ylabel('Exam 2 score')
plt.legend((adm, not_adm), ('Admitted', 'Not admitted'))
plt.show()

Implementation

Before you start with the actual cost function, recall that the logistic regression hypothesis makes use of sigmoid function. Let’s define our sigmoid function.

Sigmoid Function

def sigmoid(x):
return 1/(1+np.exp(-x))

Note that here we are writing the vectorized code. So it really doesn’t matter whether x is a scalar or a vector or a matrix or a tensor ;-). Of course writing and understanding the vectorized code takes some mind bending (which anyone will become good at after some practice). However, it gets rid of for loops and also makes for efficient and generalized code.

Cost Function

Let’s implement the cost function for the Logistic Regression.

def costFunction(theta, X, y):
J = (-1/m) * np.sum(np.multiply(y, np.log(sigmoid(X @ theta)))
+ np.multiply((1-y), np.log(1 - sigmoid(X @ theta))))
return J

Note that we have used the sigmoid function in the costFunction above.

There are multiple ways to code cost function. Whats more important is the underlying mathematical ideas and our ability to translate them into code.

Gradient Function

def gradient(theta, X, y):
return ((1/m) * X.T @ (sigmoid(X @ theta) - y))

Note that while this gradient looks identical to the linear regression gradient, the formula is actually different because linear and logistic regression have different definitions of hypothesis functions.

Let’s call these functions using the initial parameters.

(m, n) = X.shape
X = np.hstack((np.ones((m,1)), X))
y = y[:, np.newaxis]
theta = np.zeros((n+1,1)) # intializing theta with all zeros
J = costFunction(theta, X, y)
print(J)

This should give us a value of 0.693 for J.

Learning parameters using fmin_tnc

In the previous assignment, we found the optimal parameters of a linear regression model by implementing the gradient descent algorithm. We wrote a cost function and calculated its gradient, then took a gradient descent step accordingly. This time, instead of taking the gradient descent steps, we will use a built-in function fmin_tnc from scipy library.

fmin_tnc is an optimization solver that finds the minimum of an unconstrained function. For logistic regression, you want to optimize the cost function with the parameters theta.

Constraints in optimization often refer to constraints on the parameters. For example, constraints that bound the possible values theta can take (e.g., theta ≤ 1). Logistic regression does not have such constraints since theta is allowed to take any real value.

Concretely, you are going to use fmin_tnc to find the best or optimal parameters theta for the logistic regression cost function, given a fixed dataset (of X and y values). You will pass to fmin_tnc the following inputs:

  • The initial values of the parameters we are trying to optimize.
  • A function that, when given the training set and a particular theta, computes the logistic regression cost and gradient with respect to theta for the dataset (X, y).
temp = opt.fmin_tnc(func = costFunction, 
x0 = theta.flatten(),fprime = gradient,
args = (X, y.flatten()))
#the output of above function is a tuple whose first element #contains the optimized values of theta
theta_optimized = temp[0]
print(theta_optimized)

Note on flatten() function: Unfortunately scipy’s fmin_tnc doesn’t work well with column or row vector. It expects the parameters to be in an array format. The flatten() function reduces a column or row vector into array format.

The above code should give [-25.16131862, 0.20623159, 0.20147149].

If you have completed the costFunction correctly, fmin_tnc will converge on the right optimization parameters and return the final values of theta. Notice that by using fmin_tnc, you did not have to write any loops yourself, or set a learning rate like you did for gradient descent. This is all done by fmin_tnc:-) You only needed to provide a function for calculating the cost and the gradient.

Lets use these optimized theta values to calculate the cost.

J = costFunction(theta_optimized[:,np.newaxis], X, y)
print(J)

You should see a value of 0.203 . Compare this with the cost 0.693 obtained using initial theta.

Plotting Decision Boundary (Optional)

This final theta value will then be used to plot the decision boundary on the training data, resulting in a figure similar to the one below.

plot_x = [np.min(X[:,1]-2), np.max(X[:,2]+2)]
plot_y = -1/theta_optimized[2]*(theta_optimized[0]
+ np.dot(theta_optimized[1],plot_x))
mask = y.flatten() == 1
adm = plt.scatter(X[mask][:,1], X[mask][:,2])
not_adm = plt.scatter(X[~mask][:,1], X[~mask][:,2])
decision_boun = plt.plot(plot_x, plot_y)
plt.xlabel('Exam 1 score')
plt.ylabel('Exam 2 score')
plt.legend((adm, not_adm), ('Admitted', 'Not admitted'))
plt.show()

It looks like our model does a pretty good job at distinguishing the students who got the admission vs those who didn’t. Now lets quantify our model accuracy for which we will write a function rightly called accuracy

def accuracy(X, y, theta, cutoff):
pred = [sigmoid(np.dot(X, theta)) >= cutoff]
acc = np.mean(pred == y)
print(acc * 100)
accuracy(X, y.flatten(), theta_optimized, 0.5)

This should give us an accuracy score of 89% . Hmm… not bad.

You now have learnt how to perform Logistic Regression. Well done!

That’s it for this post. Give me a clap (or several claps) if you liked my work.

You can find the next post in this series here.

--

--