Logistic Regression Algorithm from Scratch in Python
There are many machine learning packages and framework that help you to train your model but they don’t show what happens behind the scene what happens to your data in each step and the mathematics involved, so for this purpose in this article, I am going to implement logistic regression algorithm from scratch without any framework.
the data set I am going to use is the iris flower data set you can find it here.
The Iris flower data set is a multivariate data set introduced by the British statistician and biologist Ronald Fisher in his 1936 paper The use of multiple measurements in taxonomic problems. It is sometimes called Anderson’s Iris data set because Edgar Anderson collected the data to quantify the morphologic variation of Iris flowers of three related species. The data set consists of 50 samples from each of three species of Iris (Iris Setosa, Iris virginica, and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.
preprocessing our data
import numpy as np
import scipy.optimize as opt
import pandas as pd# the get the same random order of row
np.random.seed(4)#the location of your IRIS.csv
data = pd.read_csv('data/IRIS.csv')#replace flowers name by numbers 1,2,3
species={'Iris-setosa':0,'Iris-versicolor':1,'Iris-virginica':2}# reorder the row of the dataset
data = data.sample(frac=1)data = data.replace({'species':species})X = data.iloc[:,:-1].values
y = data.iloc[:,-1].values
y = y[:,np.newaxis]# split our data
train_X ,test_X = X[:100,:],X[100:,:]
train_y ,test_y = y[:100,:],y[100:,:]
sigmoid function
since we are in classification problem the result of our hypothesis h(x) functions should only be between 1 or 0. so we need to use the sigmoid function g(z) :
#sigmoid function code
def sigmoid(z) :
h = 1 / (1 + np.exp(-z))
return h
cost function
the cost function of the logistic regression is represented as below :
#cost function
def costFunction(theta, X, y):
m = X.shape[0]
h = sigmoid(X @ theta)
temp1 = np.multiply(y,np.log(h))
temp2 =np.multiply( (1 - y), np.log(1 - h))
cost = -(1/m)* np.sum(temp1 + temp2)
return cost
gradient descent
Gradient descent keeps changing the Parameters to reduce the cost function gradually. With each iteration, we shall come closer to the global minimum. With each iteration, the parameters must be adapted simultaneously! The size of a “step”/iteration is determined by the parameter alpha (the learning rate). we need to choose alpha carefully if we pick I small alpha the cost function slows down and if we pick a large alpha our cost function we will fail to converge
after calculating the derivative of the cost function we get :
#gradient descent code
def gradient(theta,X,y):
m = X.shape[0]
temp = sigmoid(np.dot(X, theta)) - y
grad = np.dot(temp.T, X).T / m
return grad
cost function optimization
before running our optimization function we need to initialize our parameter theta
m = train_X.shape[0] # number of train set row
m_test = test_X.shape[0] # number of test set row#add a coulmn of ones the data set (the bias)
train_X = np.hstack(( np.ones((m,1)) ,train_X))
test_X = np.hstack(( np.ones((m_test,1)) ,test_X))# number of classes
k = 3n =train_X.shape[1]# initialize theta
theta = np.zeros((n,k))
now we can run our optimization function and I am going to use the scipy fmin_cg function
for i in range(k) :
theta[:,i] = opt.fmin_cg(
f=costFunction,
x0=theta[:,i],
fprime=gradient,
args=(train_X,(train_y == i).flatten()),
maxiter=50
)
the optimized value of parameter theta should be something like this
print(theta)
now let’s check the accuracy of our model, for that we should use the test set data :
prediction = np.argmax(test_X @ theta,axis=1)
accuracy = np.mean(prediction == test_y.flatten()) * 100
accuracy
you should get 100% accuracy so our model is doing very well
so the purpose of implementing machine learning from scratch is to get a strong intuition of the mathematics used in machine learning algorithms.
in the next time, I will solve the same problem using sklearn so we can compare between the two methods and understand what sklearn methods do behind the hood.