Coding Logistic Regression in Python From Scratch
Aim is to code logistic regression for binary classification from scratch, using the raw mathematical knowledge and concept that we have.
This is second part of series on Logistic Regression:
- Fundamental of Logistic Regression
- Coding Logistic Regression in Python From Scratch
First we need to import numpy. NumPy is a class to handle complex array calculation and reduces the time of calculations quickly. You can learn about numpy here.
import numpy as np
To package the different methods we need to create a class called “MyLogisticRegression”. The argument taken by the class are:
learning_rate- It determine the learning speed of the model, in gradient descent algorithm
num_iteration- It determine the number of time, we need to run gradient descent algorithm.
class MyLogisticRegression:
def __init__(self, learning_rate = 1, num_iterations = 2000):
self.learning_rate = learning_rate
self.num_iterations = num_iterations
self.w = []
self.b = 0
We are all set to go, first the foundation for the main algorithms are to laid.
def initialize_weight(self,dim):
"""
This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
Argument:
dim -- size of the w vector we want (or number of parameters in this case)
"""
w = np.zeros((dim,1))
b = 0
return w, b
Sigmoid
The most basic and essential element of logistic regression is logistic function also called sigmoid.
def sigmoid(self,z):
"""
Compute the sigmoid of z
Argument:
z -- is the decision boundary of the classifier
"""
s = 1/(1 + np.exp(-z))
return s
Hypothesis
Now we write a function to define the hypothesis. The subscript over ‘w’ stands for transpose of the weight vector.
def hypothesis(self,w,X,b):
"""
This function calculates the hypothesis for the present model
Argument:
w -- weight vector
X -- The input vector
b -- The bias vector
"""
H = self.sigmoid(np.dot(w.T,X)+b)
return H
Cost Function and Gradients
Cost Function is a function that measures the performance of a Machine Learning model for given data. While Gradients quantify, how better the model can perform.
def cost(self,H,Y,m):
"""
This function calculates the cost of hypothesis
Arguments:
H -- The hypothesis vector
Y -- The output
m -- Number training samples
""" cost = -np.sum(Y*np.log(H)+ (1-Y)*np.log(1-H))/m
cost = np.squeeze(cost)
return cost
def cal_gradient(self, w,H,X,Y):
"""
Calculates gradient of the given model in learning space
"""
m = X.shape[1]
dw = np.dot(X,(H-Y).T)/m
db = np.sum(H-Y)/m
grads = {"dw": dw,
"db": db}
return grads
def gradient_position(self, w, b, X, Y):
"""
It just gets calls various functions to get status of learning model Arguments:
w -- weights, a numpy array of size (no. of features, 1)
b -- bias, a scalar
X -- data of size (no. of features, number of examples)
Y -- true "label" vector (containing 0 or 1 ) of size (1, number of examples)
"""
m = X.shape[1]
H = self.hypothesis(w,X,b) # compute activation
cost = self.cost(H,Y,m) # compute cost
grads = self.cal_gradient(w, H, X, Y) # compute gradient
return grads, cost
Gradient Descent Algorithm
Gradient descent is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient. In machine learning, we use gradient descent to update the parameters (w ,b ) of our model.
The significance of this algorithm lies in the update rule, which updates the parameters according to their present steepness.
def gradient_descent(self, w, b, X, Y, print_cost = False):
"""
This function optimizes w and b by running a gradient descent algorithm
Arguments:
w — weights, a numpy array of size (num_px * num_px * 3, 1)
b — bias, a scalar
X -- data of size (no. of features, number of examples)
Y -- true "label" vector (containing 0 or 1 ) of size (1, number of examples)
print_cost — True to print the loss every 100 steps
Returns:
params — dictionary containing the weights w and bias b
grads — dictionary containing the gradients of the weights and bias with respect to the cost function
costs — list of all the costs computed during the optimization, this will be used to plot the learning curve.
"""
costs = []
for i in range(self.num_iterations):
# Cost and gradient calculation
grads, cost = self.gradient_position(w,b,X,Y)
# Retrieve derivatives from grads
dw = grads[“dw”]
db = grads[“db”]
# update rule
w = w — (self.learning_rate * dw)
b = b — (self.learning_rate * db)
# Record the costs
if i % 100 == 0:
costs.append(cost)
# Print the cost every 100 training iterations
if print_cost and i % 100 == 0:
print (“Cost after iteration %i: %f” %(i, cost))
params = {“w”: w,
“b”: b}
grads = {“dw”: dw,
“db”: db}
return params, grads, costs
Predict
Hypothesis function gives us a probability of y = 1. We resolve this by assigning p = 1, for h greater than 0.5.
def predict(self,X):
'''
Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
Arguments:
w -- weights, a numpy array of size (n, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Returns:
Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
'''
X = np.array(X)
m = X.shape[1]
Y_prediction = np.zeros((1,m))
w = self.w.reshape(X.shape[0], 1)
b = self.b
# Compute vector "H"
H = self.hypothesis(w, X, b)
for i in range(H.shape[1]):
# Convert probabilities H[0,i] to actual predictions p[0,i]
if H[0,i] >= 0.5:
Y_prediction[0,i] = 1
else:
Y_prediction[0,i] = 0
return Y_prediction
Train Model Function
This method is directly called by the user to train the hypothesis. This is one of the accessible method.
def train_model(self, X_train, Y_train, X_test, Y_test, print_cost = False):
"""
Builds the logistic regression model by calling the function you’ve implemented previously
Arguments:
X_train — training set represented by a numpy array of shape (features, m_train)
Y_train — training labels represented by a numpy array (vector) of shape (1, m_train)
X_test — test set represented by a numpy array of shape (features, m_test)
Y_test — test labels represented by a numpy array (vector) of shape (1, m_test)
print_cost — Set to true to print the cost every 100 iterations
Returns:
d — dictionary containing information about the model.
"""# initialize parameters with zeros
dim = np.shape(X_train)[0]
w, b = self.initialize_weight(dim)# Gradient descent
parameters, grads, costs = self.gradient_descent(w, b, X_train, Y_train, print_cost = False)
# Retrieve parameters w and b from dictionary “parameters”
self.w = parameters[“w”]
self.b = parameters[“b”]
# Predict test/train set examples
Y_prediction_test = self.predict(X_test)
Y_prediction_train = self.predict(X_train)# Print train/test Errors
train_score = 100 — np.mean(np.abs(Y_prediction_train — Y_train)) * 100
test_score = 100 — np.mean(np.abs(Y_prediction_test — Y_test)) * 100
print(“train accuracy: {} %”.format(100 — np.mean(np.abs(Y_prediction_train — Y_train)) * 100))
print(“test accuracy: {} %”.format(100 — np.mean(np.abs(Y_prediction_test — Y_test)) * 100)) d = {“costs”: costs,
“Y_prediction_test”: Y_prediction_test,
“Y_prediction_train” : Y_prediction_train,
“w” : self.w,
“b” : self.b,
“learning_rate” : self.learning_rate,
“num_iterations”: self.num_iterations,
“train accuracy”: train_score,
“test accuracy” : test_score}
return d
Testing on a small dataset
#Dataset
X_train = np.array([[5,6,1,3,7,4,10,1,2,0,5,3,1,4],[1,2,0,2,3,3,9,4,4,3,6,5,3,7]])
Y_train = np.array([[0,0,0,0,0,0,0,1,1,1,1,1,1,1]])
X_test = np.array([[2,3,3,3,2,4],[1,1,0,7,6,5]])
Y_test = np.array([[0,0,0,1,1,1]])
We call the class on default values
clf = MyLogisticRegression()
d = clf.train_model(X_train, Y_train, X_test, Y_test)
print (d["train accuracy"])#Output
train accuracy: 100.0 %
test accuracy: 100.0 %
100.0
We’ll set a very small learning rate and iteration number
clf = MyLogisticRegression(0.001, 100)
d = clf.train_model(X_train, Y_train, X_test, Y_test)#Output
train accuracy: 92.85714285714286 %
test accuracy: 83.33333333333334 %
If you have stayed till the end — Do Clap. It’ll keep me motivated to write more. Thank You.