Coding Logistic Regression in Python From Scratch

Om Rastogi
Analytics Vidhya
Published in
6 min readMay 13, 2020

Aim is to code logistic regression for binary classification from scratch, using the raw mathematical knowledge and concept that we have.

This is second part of series on Logistic Regression:

  1. Fundamental of Logistic Regression
  2. Coding Logistic Regression in Python From Scratch

First we need to import numpy. NumPy is a class to handle complex array calculation and reduces the time of calculations quickly. You can learn about numpy here.

import numpy as np

To package the different methods we need to create a class called “MyLogisticRegression”. The argument taken by the class are:
learning_rate- It determine the learning speed of the model, in gradient descent algorithm
num_iteration- It determine the number of time, we need to run gradient descent algorithm.

class MyLogisticRegression:

def __init__(self, learning_rate = 1, num_iterations = 2000):
self.learning_rate = learning_rate
self.num_iterations = num_iterations
self.w = []
self.b = 0

We are all set to go, first the foundation for the main algorithms are to laid.

    def initialize_weight(self,dim):
"""
This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
Argument:
dim -- size of the w vector we want (or number of parameters in this case)
"""

w = np.zeros((dim,1))
b = 0
return w, b

Sigmoid

The most basic and essential element of logistic regression is logistic function also called sigmoid.

    def sigmoid(self,z):
"""
Compute the sigmoid of z
Argument:
z -- is the decision boundary of the classifier
"""

s = 1/(1 + np.exp(-z))
return s

Hypothesis

Now we write a function to define the hypothesis. The subscript over ‘w’ stands for transpose of the weight vector.

    def hypothesis(self,w,X,b):
"""
This function calculates the hypothesis for the present model
Argument:
w -- weight vector
X -- The input vector
b -- The bias vector
"""

H = self.sigmoid(np.dot(w.T,X)+b)
return H

Cost Function and Gradients

Cost Function is a function that measures the performance of a Machine Learning model for given data. While Gradients quantify, how better the model can perform.

Formula of cost function
    def cost(self,H,Y,m):
"""
This function calculates the cost of hypothesis
Arguments:
H -- The hypothesis vector
Y -- The output
m -- Number training samples
"""
cost = -np.sum(Y*np.log(H)+ (1-Y)*np.log(1-H))/m
cost = np.squeeze(cost)
return cost
Calculation of gradient
    def cal_gradient(self, w,H,X,Y):
"""
Calculates gradient of the given model in learning space
"""

m = X.shape[1]
dw = np.dot(X,(H-Y).T)/m
db = np.sum(H-Y)/m
grads = {"dw": dw,
"db": db}
return grads

def gradient_position(self, w, b, X, Y):
"""
It just gets calls various functions to get status of learning model
Arguments:
w -- weights, a numpy array of size (no. of features, 1)
b -- bias, a scalar
X -- data of size (no. of features, number of examples)
Y -- true "label" vector (containing 0 or 1 ) of size (1, number of examples)
"""

m = X.shape[1]
H = self.hypothesis(w,X,b) # compute activation
cost = self.cost(H,Y,m) # compute cost
grads = self.cal_gradient(w, H, X, Y) # compute gradient
return grads, cost

Gradient Descent Algorithm

Gradient descent is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient. In machine learning, we use gradient descent to update the parameters (w ,b ) of our model.

The significance of this algorithm lies in the update rule, which updates the parameters according to their present steepness.

    def gradient_descent(self, w, b, X, Y, print_cost = False):
"""
This function optimizes w and b by running a gradient descent algorithm

Arguments:
w — weights, a numpy array of size (num_px * num_px * 3, 1)
b — bias, a scalar
X -- data of size (no. of features, number of examples)
Y -- true "label" vector (containing 0 or 1 ) of size (1, number of examples)
print_cost — True to print the loss every 100 steps

Returns:
params — dictionary containing the weights w and bias b
grads — dictionary containing the gradients of the weights and bias with respect to the cost function
costs — list of all the costs computed during the optimization, this will be used to plot the learning curve.
"""


costs = []

for i in range(self.num_iterations):
# Cost and gradient calculation
grads, cost = self.gradient_position(w,b,X,Y)


# Retrieve derivatives from grads
dw = grads[“dw”]
db = grads[“db”]

# update rule
w = w — (self.learning_rate * dw)
b = b — (self.learning_rate * db)


# Record the costs
if i % 100 == 0:
costs.append(cost)

# Print the cost every 100 training iterations
if print_cost and i % 100 == 0:
print (“Cost after iteration %i: %f” %(i, cost))


params = {“w”: w,
“b”: b}

grads = {“dw”: dw,
“db”: db}

return params, grads, costs

Predict

Hypothesis function gives us a probability of y = 1. We resolve this by assigning p = 1, for h greater than 0.5.

    def predict(self,X):
'''
Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)

Arguments:
w -- weights, a numpy array of size (n, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)

Returns:
Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
'''

X = np.array(X)
m = X.shape[1]

Y_prediction = np.zeros((1,m))

w = self.w.reshape(X.shape[0], 1)
b = self.b
# Compute vector "H"
H = self.hypothesis(w, X, b)

for i in range(H.shape[1]):
# Convert probabilities H[0,i] to actual predictions p[0,i]
if H[0,i] >= 0.5:
Y_prediction[0,i] = 1
else:
Y_prediction[0,i] = 0

return Y_prediction

Train Model Function

This method is directly called by the user to train the hypothesis. This is one of the accessible method.

    def train_model(self, X_train, Y_train, X_test, Y_test, print_cost = False):
"""
Builds the logistic regression model by calling the function you’ve implemented previously

Arguments:
X_train — training set represented by a numpy array of shape (features, m_train)
Y_train — training labels represented by a numpy array (vector) of shape (1, m_train)
X_test — test set represented by a numpy array of shape (features, m_test)
Y_test — test labels represented by a numpy array (vector) of shape (1, m_test)
print_cost — Set to true to print the cost every 100 iterations

Returns:
d — dictionary containing information about the model.
"""
# initialize parameters with zeros
dim = np.shape(X_train)[0]
w, b = self.initialize_weight(dim)
# Gradient descent
parameters, grads, costs = self.gradient_descent(w, b, X_train, Y_train, print_cost = False)

# Retrieve parameters w and b from dictionary “parameters”
self.w = parameters[“w”]
self.b = parameters[“b”]

# Predict test/train set examples
Y_prediction_test = self.predict(X_test)
Y_prediction_train = self.predict(X_train)
# Print train/test Errors
train_score = 100 — np.mean(np.abs(Y_prediction_train — Y_train)) * 100
test_score = 100 — np.mean(np.abs(Y_prediction_test — Y_test)) * 100
print(“train accuracy: {} %”.format(100 — np.mean(np.abs(Y_prediction_train — Y_train)) * 100))
print(“test accuracy: {} %”.format(100 — np.mean(np.abs(Y_prediction_test — Y_test)) * 100))
d = {“costs”: costs,
“Y_prediction_test”: Y_prediction_test,
“Y_prediction_train” : Y_prediction_train,
“w” : self.w,
“b” : self.b,
“learning_rate” : self.learning_rate,
“num_iterations”: self.num_iterations,
“train accuracy”: train_score,
“test accuracy” : test_score}

return d

Testing on a small dataset

#Dataset
X_train = np.array([[5,6,1,3,7,4,10,1,2,0,5,3,1,4],[1,2,0,2,3,3,9,4,4,3,6,5,3,7]])
Y_train = np.array([[0,0,0,0,0,0,0,1,1,1,1,1,1,1]])
X_test = np.array([[2,3,3,3,2,4],[1,1,0,7,6,5]])
Y_test = np.array([[0,0,0,1,1,1]])

We call the class on default values

clf = MyLogisticRegression()
d = clf.train_model(X_train, Y_train, X_test, Y_test)
print (d["train accuracy"])
#Output
train accuracy: 100.0 %
test accuracy: 100.0 %
100.0

We’ll set a very small learning rate and iteration number

clf = MyLogisticRegression(0.001, 100)
d = clf.train_model(X_train, Y_train, X_test, Y_test)
#Output
train accuracy: 92.85714285714286 %
test accuracy: 83.33333333333334 %

If you have stayed till the end — Do Clap. It’ll keep me motivated to write more. Thank You.

--

--

Om Rastogi
Analytics Vidhya

I believe in an altruistic world, where creativity and imagination replace repetitive work