Logistic Regression Algorithm from Scratch in Python

Published in

Analytics Vidhya

4 min readApr 6, 2020

There are many machine learning packages and framework that help you to train your model but they don’t show what happens behind the scene what happens to your data in each step and the mathematics involved, so for this purpose in this article, I am going to implement logistic regression algorithm from scratch without any framework.

the data set I am going to use is the iris flower data set you can find it here.

The Iris flower data set is a multivariate data set introduced by the British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems. It is sometimes called Anderson’s Iris data set because Edgar Anderson collected the data to quantify the morphologic variation of Iris flowers of three related species. The data set consists of 50 samples from each of three species of Iris (Iris Setosa, Iris virginica, and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.

preprocessing our data

import numpy as np
import scipy.optimize as opt
import pandas as pd# the get the same random order of row
np.random.seed(4)#the location of your IRIS.csv
data = pd.read_csv('data/IRIS.csv')#replace flowers name by numbers 1,2,3
species={'Iris-setosa':0,'Iris-versicolor':1,'Iris-virginica':2}# reorder the row of the dataset
data  = data.sample(frac=1)data = data.replace({'species':species})X = data.iloc[:,:-1].values
y = data.iloc[:,-1].values
y = y[:,np.newaxis]# split our data 
train_X ,test_X = X[:100,:],X[100:,:]
train_y ,test_y = y[:100,:],y[100:,:]

sigmoid function

since we are in classification problem the result of our hypothesis h(x) functions should only be between 1 or 0. so we need to use the sigmoid function g(z) :

#sigmoid function code 
def sigmoid(z) : 
    h = 1 / (1 + np.exp(-z))
    return h

cost function

the cost function of the logistic regression is represented as below :

#cost function 
def costFunction(theta, X, y):
    m = X.shape[0]
    h = sigmoid(X @ theta)
    temp1 = np.multiply(y,np.log(h))
    temp2 =np.multiply( (1 - y), np.log(1 - h))
    cost = -(1/m)* np.sum(temp1 + temp2) 
    return cost

gradient descent

Gradient descent keeps changing the Parameters to reduce the cost function gradually. With each iteration, we shall come closer to the global minimum. With each iteration, the parameters must be adapted simultaneously! The size of a “step”/iteration is determined by the parameter alpha (the learning rate). we need to choose alpha carefully if we pick I small alpha the cost function slows down and if we pick a large alpha our cost function we will fail to converge

after calculating the derivative of the cost function we get :

#gradient descent code
def gradient(theta,X,y):
    m = X.shape[0]
    temp = sigmoid(np.dot(X, theta)) - y
    grad = np.dot(temp.T, X).T / m
    
    return grad

cost function optimization

before running our optimization function we need to initialize our parameter theta

m = train_X.shape[0] # number of train set row 
m_test = test_X.shape[0] # number of test set row#add a coulmn of ones the data set (the bias)
train_X = np.hstack(( np.ones((m,1)) ,train_X)) 
test_X = np.hstack(( np.ones((m_test,1)) ,test_X))# number of classes
k = 3n =train_X.shape[1]# initialize theta
theta = np.zeros((n,k))

now we can run our optimization function and I am going to use the scipy fmin_cg function

for i in range(k) :
    theta[:,i] = opt.fmin_cg(
        f=costFunction,
        x0=theta[:,i],
        fprime=gradient,
        args=(train_X,(train_y == i).flatten()),
        maxiter=50
    )

the optimized value of parameter theta should be something like this

print(theta)

now let’s check the accuracy of our model, for that we should use the test set data :

prediction = np.argmax(test_X @ theta,axis=1) 
accuracy = np.mean(prediction == test_y.flatten()) * 100
accuracy

you should get 100% accuracy so our model is doing very well

so the purpose of implementing machine learning from scratch is to get a strong intuition of the mathematics used in machine learning algorithms.

in the next time, I will solve the same problem using sklearn so we can compare between the two methods and understand what sklearn methods do behind the hood.