Logistic Regression: Sigmoid Function and Threshold

Mukesh Chaudhary
6 min readAug 21, 2020

--

Understanding sigmoid function and threshold of logistic Regression in real data case.

In this blog, we are going to describe sigmoid function and threshold of logistic regression in term of real data. Linear Regression and Logistic Regression are benchmark algorithm in Data Science field. I think we should fit train data on these Regression model before to fit on another algorithms because I think we should start fit models via these model. If we have linear problem, then we can use Linear Regression model or if we have classification problem, then we can use Logistic Regression model. These give us some basic idea what is going on in our data set.

Logistic Regression:

As name , It is classification algorithm and used in classification task.To assign each prediction to a class, we need to convert the predictions to probability(i.e between 0,1). To achieve that we will use sigmoid function, which maps every real value into another value between 0 and 1.

Sigmoid function
def sigmoid(z):
return 1 / (1 + np.exp(-z))
z = np.dot(X, weight)
h = sigmoid(z)

LR is also a transformation of a linear regression using the sigmoid function.
If we compare with linear regression equation , then it gets like same . Let’s describe a tittle bit more sigmoid function how work there.

If we translate above equation as a data , we might get following equation

When we want to apply this to a binary dataset, the expression for a logistic regression model would look like this

If we assume income more than 4000 USD is one class i.e 1 than less than 4000 USD is another class .

If we assume

then it looks like our sigmoid function formula. In fact , This is inner side of mechanism.

Threshold

In above equation, 4000 UDS is threshold point where we can split binary data as a two class . This depend on company business requirement. It may be vary across company. For example, when we predict spam email or not , we can set less threshold . In this case , we don’t want lost any information . We will try to get maximum email by setting lower threshold . However, in case health food quality case, we set maximum threshold because we don’t want to release any defective product from manufacture company.

Loss Function

Functions have parameters/weights (represented by theta in our notation) and we want to find the best values for them. To start we pick random values and we need a way to measure how well the algorithm performs using those random weights. That measure is computed using the loss function, defined as

Loss function
def loss(h, y):
return (-y * np.log(h) — (1 — y) * np.log(1 — h)).mean()

Gradient descent

Our goal is to minimize the loss function and the way we have to achieve it is by increasing/decreasing the weights, i.e. fitting them. The derivative of the loss function with respect to each weight tell us how loss would change if we modified the parameters.

gradient = np.dot(X.T, (h - y)) / y.shape[0]

For update weight , we have

# Update the weightlr = 0.01     # lr = learning rate
weight = weight - lr * gradient

Predictions

By calling the sigmoid function we get the probability that some input x belongs to class 1. Let’s take all probabilities ≥ 0.5 = class 1 and all probabilities < 0 = class 0. This threshold should be defined depending on the business problem we were working.

def predict_probs(X, weight):
return sigmoid(np.dot(X, weight))def predict(X, weight, threshold=0.5):
return predict_probs(X, weight) >= threshold

Implementation in Python

Here , Logistic Regression is made by manual class and evaluated them.We also use Logistic Regression class from sklearn library and evaluated them. After that, We analysis results came from those classes.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")
# import sklearn libryry
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression
# plot
np.random.seed(1234)
age = np.random.uniform(18, 65, 100)
income = np.random.normal((age/10), 0.5)
age = age.reshape(-1,1)
fig = plt.figure(figsize=(8,6))
fig.suptitle('age vs income', fontsize=16)
plt.scatter(age, income)
plt.xlabel('age', fontsize=14)
plt.ylabel('monthly income', fontsize=14)
plt.show()
# convert binary data
income_bin = income > 4
income_bin = income_bin.astype(int)
print(income_bin)
# plot Binary data , Classification
fig = plt.figure(figsize=(8, 6))
fig.suptitle('age vs binary income', fontsize=16)
plt.scatter(age, income_bin)
plt.xlabel('age', fontsize=14)
plt.ylabel('monthly income (> or < 4000)', fontsize=14)
plt.show()
# class built for logistic Regression
class LogisticRegression:
def __init__(self, lr=0.01, num_iter=100000, fit_intercept=True, verbose=False):
self.lr = lr
self.num_iter = num_iter
self.fit_intercept = fit_intercept
self.verbose = verbose

def __add_intercept(self, X):
intercept = np.ones((X.shape[0], 1))
return np.concatenate((intercept, X), axis=1)

def __sigmoid(self, z):
return 1 / (1 + np.exp(-z))
def __loss(self, h, y):
return (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()

def fit(self, X, y):
if self.fit_intercept:
X = self.__add_intercept(X)

# weights initialization
self.weight = np.zeros(X.shape[1])

for i in range(self.num_iter):
z = np.dot(X, self.weight)
h = self.__sigmoid(z)
gradient = np.dot(X.T, (h - y)) / y.size
self.weight -= self.lr * gradient

if(self.verbose == True and i % 10000 == 0):
z = np.dot(X, self.weight)
h = self.__sigmoid(z)
print(f'loss: {self.__loss(h, y)} \t')

def predict_prob(self, X):
if self.fit_intercept:
X = self.__add_intercept(X)

return self.__sigmoid(np.dot(X, self.weight))

def predict(self, X, threshold):
return self.predict_prob(X) >= threshold
# fit model
model = LogisticRegression(lr=0.1,num_iter= 300000)
model.fit(age,income_bin)
# predict classed
preds = model.predict(age,0.5)
#Evaluation
(preds == income_bin).mean()
#output
# 0.93
# fit model sklearn class
# import sklearn libryry
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import LinearRegression

lg = LogisticRegression()
lg.fit(age,income_bin)
# predict and and evaluation
predict_income_bin = lg.predict(age)
# import metrics
from sklearn.metrics import confusion_matrix,precision_score,recall_score
print("confusion matrix")
print(confusion_matrix(income_bin,predict_income_bin))
print("Precision Score")
print(precision_score(income_bin,predict_income_bin))
print("Recall Score")
print(recall_score(income_bin,predict_income_bin))
#Output
#Precision Score. 0.9433962264150944
#Recall Score 0.9433962264150944

Conclusion:

Logistic Regression is used for Binary classification problem. Sigmoid function is used for this algorithm. However, Sigmoid function is same as linear equation . It divides into classes via threshold in probability outcome. The main advantage is here that we can set threshold as per business requirement. We did analysis on both class , manual built and sklearn class. We got only 0.1 difference . So I think it give us more clarity on logistic Regression from scratch level. For full length of code , please visit github link.

References

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.htmlhttps://scikit-learn.org/stable/modules/model_evaluation.htmlhttps://en.wikipedia.org/wiki/Logistic_regressionhttps://en.wikipedia.org/wiki/Logistic_functionhttps://medium.com/analytics-vidhya/coding-logistic-regression-in-python-2ad6a0214b66

--

--