# Python Implementation of Andrew Ng’s Machine Learning Course (Part 2.1)

In my previous post we had discussed about Pythonic implementation of Linear Regression with Single and Multiple independent variables as part of week 1 and week 2 programming assignment. Now we will move to week 3 content i.e., Logistic Regression.

Now since this is going to be a pretty lengthy post I am going to divide this post into two parts. Watch out for Part 2.2 that looks into how to combat overfitting problem.

If you are new here I would encourage you to read my previous post

Python Implementation of Andrew Ng’s Machine Learning Course (Part 1)

Pre-requisites

It’s highly recommended that first you watch theweek 3video lectures.

Should have basic familiarity with the Python ecosystem.

Here we will look into one of the most widely used ML algorithm in the industry.

**Logistic Regression**

In this part of the exercise, you will build a logistic regression model to predict whether a student gets admitted into a university.

Problem context

Suppose that you are the administrator of a university department and you want to determine each applicant’s chance of admission based on their results on two exams. You have historical data from previous applicants that you can use as a training set for logistic regression. For each training example, you have the applicant’s scores on two exams and the admissions decision.

Your task is to build a classification model that estimates an applicant’s probability of admission based on the scores from those two exams.

First let’s load the necessary libraries.

`import numpy as np`

import pandas as pd

import matplotlib.pyplot as plt

import scipy.optimize as opt # more on this later

Next, we read the data (the necessary data is available under week-3 content)

`data = pd.read_csv('ex2data1.txt', header = None)`

X = data.iloc[:,:-1]

y = data.iloc[:,2]

data.head()

So we have two independent features and one dependent variable. Here `0`

means candidate was unable to get an admission and `1`

vice-versa.

**Visualizing the data**

Before starting to implement any learning algorithm, it is always good to visualize the data if possible.

`mask = y == 1`

adm = plt.scatter(X[mask][0].values, X[mask][1].values)

not_adm = plt.scatter(X[~mask][0].values, X[~mask][1].values)

plt.xlabel('Exam 1 score')

plt.ylabel('Exam 2 score')

plt.legend((adm, not_adm), ('Admitted', 'Not admitted'))

plt.show()

**Implementation**

Before you start with the actual cost function, recall that the logistic regression hypothesis makes use of sigmoid function. Let’s define our sigmoid function.

**Sigmoid Function**

`def sigmoid(x):`

return 1/(1+np.exp(-x))

Note that here we are writing the vectorized code. So it really doesn’t matter whether `x`

is a scalar or a vector or a matrix or a tensor ;-). Of course writing and understanding the vectorized code takes some mind bending (which anyone will become good at after some practice). However, it gets rid of *for loops* and also makes for efficient and generalized code.

**Cost Function**

Let’s implement the cost function for the Logistic Regression.

`def costFunction(theta, X, y):`

J = (-1/m) * np.sum(np.multiply(y, np.log(sigmoid(X @ theta)))

+ np.multiply((1-y), np.log(1 - sigmoid(X @ theta))))

return J

Note that we have used the `sigmoid`

function in the `costFunction`

above.

There are multiple ways to code cost function. Whats more important is the underlying mathematical ideas and our ability to translate them into code.

**Gradient Function**

`def gradient(theta, X, y):`

return ((1/m) * X.T @ (sigmoid(X @ theta) - y))

Note that while this gradient looks identical to the linear regression gradient, the formula is actually different because linear and logistic regression have different definitions of hypothesis functions.

Let’s call these functions using the initial parameters.

(m, n) = X.shape

X = np.hstack((np.ones((m,1)), X))

y = y[:, np.newaxis]

theta = np.zeros((n+1,1)) # intializing theta with all zerosJ = costFunction(theta, X, y)

print(J)

This should give us a value of `0.693`

for J.

**Learning parameters using fmin_tnc**

In the previous assignment, we found the optimal parameters of a linear regression model by implementing the gradient descent algorithm. We wrote a cost function and calculated its gradient, then took a gradient descent step accordingly. This time, instead of taking the gradient descent steps, we will use a built-in function `fmin_tnc`

from `scipy `

library.

`fmin_tnc`

is an optimization solver that finds the minimum of an unconstrained function. For logistic regression, you want to optimize the cost function with the parameters `theta`

.

Constraints in optimization often refer to constraints on the parameters. For example, constraints that bound the possible values

`theta`

can take (e.g.,`theta`

≤ 1). Logistic regression does not have such constraints since`theta`

is allowed to take any real value.

Concretely, you are going to use `fmin_tnc`

to find the best or optimal parameters `theta`

for the logistic regression cost function, given a fixed dataset (of X and y values). You will pass to `fmin_tnc `

the following inputs:

- The initial values of the parameters we are trying to optimize.
- A function that, when given the training set and a particular
`theta`

, computes the logistic regression cost and gradient with respect to`theta`

for the dataset (X, y).

`temp = opt.fmin_tnc(func = costFunction, `

x0 = theta.flatten(),fprime = gradient,

args = (X, y.flatten()))

#the output of above function is a tuple whose first element #contains the optimized values of theta

theta_optimized = temp[0]

print(theta_optimized)

Note on

flatten()function: Unfortunately

scipy’s fmin_tncdoesn’t work well with column or row vector. It expects the parameters to be in an array format. The

flatten()function reduces a column or row vector into array format.

The above code should give `[-25.16131862, 0.20623159, 0.20147149]`

.

If you have completed the `costFunction `

correctly, `fmin_tnc`

will converge on the right optimization parameters and return the final values of `theta`

. Notice that by using `fmin_tnc`

, you did not have to write any loops yourself, or set a learning rate like you did for gradient descent. This is all done by `fmin_tnc`

:-) You only needed to provide a function for calculating the cost and the gradient.

Lets use these optimized `theta`

values to calculate the cost.

`J = costFunction(theta_optimized[:,np.newaxis], X, y)`

print(J)

You should see a value of `0.203`

. Compare this with the cost `0.693 `

obtained using initial `theta`

.

**Plotting Decision Boundary (Optional)**

This final `theta`

value will then be used to plot the decision boundary on the training data, resulting in a figure similar to the one below.

plot_x = [np.min(X[:,1]-2), np.max(X[:,2]+2)]

plot_y = -1/theta_optimized[2]*(theta_optimized[0]

+ np.dot(theta_optimized[1],plot_x)) mask = y.flatten() == 1

adm = plt.scatter(X[mask][:,1], X[mask][:,2])

not_adm = plt.scatter(X[~mask][:,1], X[~mask][:,2])

decision_boun = plt.plot(plot_x, plot_y)

plt.xlabel('Exam 1 score')

plt.ylabel('Exam 2 score')

plt.legend((adm, not_adm), ('Admitted', 'Not admitted'))

plt.show()

It looks like our model does a pretty good job at distinguishing the students who got the admission vs those who didn’t. Now lets quantify our model accuracy for which we will write a function rightly called `accuracy`

def accuracy(X, y, theta, cutoff):

pred = [sigmoid(np.dot(X, theta)) >= cutoff]

acc = np.mean(pred == y)

print(acc * 100)accuracy(X, y.flatten(), theta_optimized, 0.5)

This should give us an accuracy score of `89%`

. Hmm… not bad.

You now have learnt how to perform Logistic Regression. Well done!

That’s it for this post. Give me a clap (or several claps) if you liked my work.

You can find the next post in this series here.