Classification with Logistic Regression.

M. Madhusanka
Analytics Vidhya
Published in
6 min readJun 12, 2020
Simple Classification Representation.

The classification is the process of determining which set of categories, also known as sub population a new observation or a new instance belongs to. Before determining for a new instance, a model should be trained using already classified training data only then it is possible to perform the classification for a new member.

In this example data set it the final state of a student (Pass or fail) is determined by marks of two exams, only the data is available, a hypothesis with appropriate weights should be formed to determine the state of new student marks.

First let us understand the algorithm under logistic regression.

There are mainly two types of classification problems binary classification and multiclass classification, If the instances of the dataset is classified for more than two groups it is recognized as multiclass. In contrast if there only two groups specified in the data set it’s called binary class. According the the small description given of the above problem there are only two classes (0 — fail, 1 — pass).

#importing libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import math
import random
#train test split
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
data = 'exam_data.csv'
Names = ['mark1','mark2','class']
Data = pd.read_csv(data, names=Names)
data = [Data]mapping_cls = {'pass':1,'fail':0}
for instance in data:
instance['class'] = instance['class'].map(mapping_cls)
#print(Data.head())
#taking features and labels seperatly
train_data = Data.drop('class', axis=1)
target = Data['class']
#add a coulm of 1s to train test
m,n = train_data.shape
ones = np.ones((m,1))
train_data = np.hstack((ones, train_data))
#split train test data set
X_train, X_test, y_train, Y_test = train_test_split(train_data,target, test_size=0.2)
Y_train = np.array([y_train], dtype=np.float64)
Y_train = Y_train.T
Data Visualization

Binary Classification using logistic regression.

Sigmoid function.

The hypothesis of the logistic regression is the same as linear regression h(x). In linear regression it is used to predict values of continuous dependent variable. In logistic regression the dependent variable is only limited to number of values based on the number of classes that it is predicted to.

The purpose of the sigmoid function is to bring the output of the hypothesis to one class or the other, the hypothesis function should be a “1” or “0”.

figure 01 — sigmoid function
def sigmoid(x):
return 1 / (1 + np.exp(-x))

Representation of sigmoid function.

“The probability that y=1, given x parameterize by ϴ”

“The probability of y becoming 1 + The probability of y becoming 0 = 1”

By using a threshold value we could predict which class each instance belongs to. When out-put of hypothesis function is below 0.5 the value of axis Y is going to Zero (class zero), and when out-put is equal or greater than 0.5 the value of axis Y is going to One (class one).

figure 2: Graphical representation o the sigmoid.

As mentioned before, first step in logistic regression is to obtain hypothesis function, x_0, x_1, x_2, … represents the features of the instances while ϴ_0, ϴ_1, ϴ_2 … represents the coefficients as weights.

figure 03 — linear regression and logistic regression.

The difference between linear regression and logistic regression is that the hypothesis result in logistic regression is sent through a sigmoid function.

Cost Function

Cost function is useful in measuring the performance of a machine learning function. It is a representation of quantified error between the predicted value and the expected value.

The cost function represents the squired error of the logistic regression function, we recognize the global minimum of the plot between cost function and iterations, the iteration that the global minimum accrues is the function which the lowest error occurred.

figure 04 — convex curve

If the out-put of the construction was plot against the number of iterations without applying to the sigmoid function, it would result in a non-convex plot plot where local minima will also appear, making it hard to understand at what iteration was the error was at its lowest.

figure 05— cost function

And when we consider the red line, if the predicted class is 0 and the actual class is also zero the error is zero, but is the hypothesis predicted as 1, the answer for the cost function goes to infinity.

figure 08 — hypothesis result against the cost function result.
#cost function
def cal_cost(theta,train_set,target):
m = len(target)
pred = sigmoid(np.dot(train_set,theta))
#cost function
cost = (-1/m)*np.sum(((target )*(np.log(pred))) + ((1 - target)*(np.log(1 - pred))))
return cost
#finding the hypothesis of the function
theta = np.random.rand(3,1)

Gradient decent

A method used for optimizing algorithm, to find the most suitable parameter values for the hypothesis function. In other words, it’s the same as trying to get to the lowest point (global minimum) of the convex function given above.

The learning rate determined the size of the steps taken to get to the global minimum. With high learning rate more ground could be covered with each step, but also it has some risk of overshooting the required lowest point since the slope of the curve steepens with every step with every iteration. The parameter values at the global minimum is also where the value of the cost function is at the lowest as represented in the figure 3 convex plot.

The process of obtaining the weights with lowest error.

Figure 4: Equation for minimizing the cost.

By solving the above equation (differentiating the cost unction J(ϴ)), we get,

#gradient descent
def gradient_descent(x, y, theta, iteration=100, learingn_rate=0.01):
#x - feature matrix
#y - labels
#iterations - running times
m = len(y)
cost_history = np.zeros(iteration)
theta_history = np.zeros((iteration, 3))
for i in range(iteration):
#predict values
predict = sigmoid(np.dot(x,theta))
theta = theta - (1/m)*learingn_rate*(x.T.dot((predict - y)))
theta_history[i,:] = theta.T
cost_history[i] = cal_cost(theta,x,y)
return theta, cost_history, theta_historytr = 0.0001
n_itr = 30
theta, cost_history,theta_history = gradient_descent(X_train,Y_train,theta,n_itr)
Reduction of the cost with every iteration.

Implementing Logistic Regression from scikit learn.

A set of data is given to you, lets assume there are list of patients who are diagnosed with barest cancer and that of how are benign (healthy). The prediction was done using examined features of each patient. The objective for us is to develop an classification model based on logistic regression theories to determine if a new patient has cancer or not based based on the examined features.

The breast cancer databases was obtained from the University of Wisconsin Hospitals, Madison from Dr. William H. Wolberg.

Features of this data set include id number, Clump thickness, Uniform of Cell Size, Uniform of Cell Shape, Marginal Adhesion, Single Epithelial Cell Size, Bare Nuclei, Bland Chromatin, Normal Nucleoli, Mitoses and Class (2 for benign, 4 for malignant).

https://gist.github.com/sheheran/089555df0b03334c3ff00d1ba6e66415

#import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
#train test split
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
#import data using pands
df = pd.read_csv('breast-cancer-wisconsin.data.txt')
df.replace('?', 0, inplace=True)
df.drop(['id'], 1, inplace=True)
m,n = X.shape
ones = np.ones((m,1))
X = np.hstack((ones, X))
#shuffel and divide data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
clf = LogisticRegression(solver='liblinear',multi_class='ovr')
clf.fit(X_train,y_train)
accuracy = clf.score(X_test,y_test)
print(accuracy)

The accuracy of the model was 96%.

Special thanks to Santdex for data set and preprocessing & Andrew Ng for the theoretical knowledge.

Reference

  1. O. L. Mangasarian and W. H. Wolberg, “Cancer diagnosis via linear programming.” Sep-1990.

--

--