‘Machine Learning’ course by Andrew Ng: Recoding with Python — Part4: Regularized Logistic Regression

5 min readJun 18, 2022

Image Ref: ”https://www.freepik.com/vectors/computer-cartoon" Computer cartoon vector created by svstudioart — www.freepik.com

This is the fourth article in this series where I try to recode the exercises in the (old) Machine Learning course by Andrew Ng (where the programming exercises are done using Octave). My intention in writing these articles is to assist the learners of this course to use Python as an alternative while doing the exercises. Please also feel free to explore the previous parts in this series too:
Part1: Linear Regression model with one feature
Part2: Linear Regression with multiple features
Part3: (Unregularized) Logistic Regression model

In this Part4, we will build a regularized logistic regression model to predict whether microchips from a fabrication plant pass the quality assurance (QA) test. Suppose that we are the product managers of the factory and we have the test results for some microchips on two different tests. From these two tests, we would like to determine whether the microchips should be accepted or rejected.

1. Getting to know the data

The dataset given contains 118 records with two features (Test1 score and Test2 score) and 1 target column (here, 1 means the microchip passes the QA test and is accepted while 0 means it’s rejected). As usual, I will load the data and try to visualize the data to get an idea of what it looks like.

import pandas as pd
import numpy as np
import matplotlib.pyplot as pltdata = pd.read_table('ex2data2.txt', header = None, sep=',')
data.columns =['Test1', 'Test2', 'Score']def plotdata(data):
    one_index = data.index[data['Score'] ==1].tolist()
    zero_index = data.index[data['Score'] ==0].tolist()    plt.plot(data.iloc[one_index, 0], data.iloc[one_index, 1], 'k+', linewidth = 2, markersize = 7)
    plt.plot(data.iloc[zero_index, 0], data.iloc[zero_index, 1], 'ko', markersize = 7)
    plt.xlabel('Test1')
    plt.ylabel('Test2')
    plt.legend(['Accepted', 'Rejected'])
    plt.show()plotdata(data)

The accepted microchips are black dots and the rejected chips are plus symbols.

From the above plot, we can see the decision boundary cannot be represented by a linear line anymore.

2. Feature Mapping

One way to fit the data better is to create more features from each data point. We will map the features into all polynomial terms of x1 and x2 up to the sixth power. The result of the mapfeature() function will be something like the below:

# This function will map the two input features to quadratic features used in the regularization exercise.
# Will return a new feature array with more features.
# input x1 and x2 must be of the same size.def mapfeature(x1,x2,degree):
    
    m = x1.shape[0]
    # the bias term on the first column
    result = np.ones((x1.shape[0])).reshape(m,1)
    
    for i in range(1, degree+1):
        for j in range(i+1):
            out = np.multiply(np.power(x1,(i-j)),np.power(x2,j)).reshape(m,1)
            result = np.concatenate((result, out), axis =1)
            
    return resultm = len(data.iloc[:, 2])
y = data.iloc[:, 2].to_numpy().reshape(m,1)
x_ = data.iloc[:, :2].to_numpy()x = mapfeature(x_[:,0], x_[:,1], 6)

The final mapped result x is a (118,28) matrix now. A logistic regression classifier trained on this higher-dimension feature matrix will have a more complex decision boundary. With feature mapping, we can build a more expressive classifier but overfitting can become a problem.

3. Cost Function and Gradient (with Regularization)

The regularized cost function of logistic regression is:

And the gradient of the cost function is defined as:

Below is the python code for the sigmoid function, regularized cost function, and gradient function.

# this function will compute cost of using theta as the parameter for regularized logistic regressiondef sigmoid(z):
    g = 1/(1 + np.exp(-z))
    return g
def J_reg(theta, x, y, lambdaa):
    m = len(y)
    y_prime = np.transpose(y)
    x_prime = np.transpose(x)
    h_theta = sigmoid(np.dot(x,theta))
    
    J = (-np.dot(y_prime, np.log(h_theta)) -  np.dot(np.transpose(1-y), np.log(1-h_theta)) 
        + 0.5*lambdaa*np.sum(np.power(theta[1::],2)))/ m
    J = np.sum(J)
    
    return J
def Gradient(theta, x, y, lambdaa):
    m = len(y)
       
    error = sigmoid(np.dot(x,theta)) - y
    grad = np.zeros((x.shape[1], 1))
    
    for i in range(x.shape[1]):
        if i==0:
            grad[i] = (np.dot(np.transpose(x[:,i]), error))/ m
        else:
            grad[i] = (np.dot(np.transpose(x[:,i]), error))/ m + theta[i]*lambdaa/m
   
    return grad.flatten()

4. Learning parameters using Newton-Conjugate Gradient

I have used this fmin_ncg() function (an unconstrained minimization using the Newton-CG method) from scipy.optimize in my previous logistic regression model (without regularization) as well. The details of the function can be read here.

The required parameters for this function are:

An objective function to be minimized. (this is our cost function J(ϴ)).
Initial guess of the parameters to be optimized i.e., our initialized ϴ values.
Gradient of J(ϴ)
maxiter: maximum number of iterations to perform

The function will return the optimized ϴ values (together with other information).

import scipy.optimize as opttheta_start = np.zeros((x.shape[1], 1))
lambdaa=1result = opt.fmin_ncg(J_reg, x0=theta_start, fprime = Gradient, args = (x,y.flatten(), lambdaa), maxiter = 400)print("Cost at theta found by fmin_ncg() : {}\n".format(J_reg(result,x,y,lambdaa)))
print('Theta : {}'.format(result))

The result is as below:

5. Prediction Accuracy

Let’s evaluate the quality of the parameters we have found by asking the model to predict our training set data.

# Function for predicting
# this will predict whether the label is 1 or 0 using the leraned logistic regression parameters thetadef predict(theta, x):
    m = len(x)
    p = np.zeros((m,1))
    for i in range(m):
        sig = sigmoid(np.dot(x[i],theta))
        if sig > 0.5:
            p[i] = 1
        else:
            p[i] = 0
    return pp = predict(result,x)
print("Train Accuracy : {}%".format(int(sum(p==y))))

From this, our train accuracy is 98%.

This is all for this part. In the upcoming Part5, we will build multiple one-vs-all logistic regression models to implement a multi-class classifier and compare this model performance with the performance of a simple neural network.

Keep Learning. Enjoy the journey!