Machine Learning Tips: Adjusting Decision Threshold for binary classification

7 min readSep 24, 2023

Probabilistic models such as logistic regression or Naive Bayes model are commonly used for binary classification tasks. However, documentation, tutorials and training material often overlook a crucial step of the process : deciding on a decision threshold.

This article offers a brief explanation of the decision threshold concept in binary classification, illustrates the motivation to adjust it, and provides a code example to implement custom decision thresholds with Python and Scikit.

What is a the decision threshold ?

In binary classification, the task at hand is to determine whether a tested element assigned a label A or B, indicating whether it belongs to one group or another.

In this case, a probabilistic model will return a probability, denoted as ‘p’ that a element belongs to the class A, , and the probability of it belonging to class B can be expressed as simply 1–p.

The key point here is that probabilistic models return probabilities, not class assignments nor labels.

To transition from probabilities to class or label predictions and construct a classifier, it is necessary to establish a decision threshold. Then, if the probability ‘p’ exceeds the threshold, the element is considered to belong to group A; otherwise, it is assigned to group B. The flowchart below provides a summary of the entire process.

A common approach is to use 50% as decision threshold. In this case, the label A is assigned if the probability p is above 50%, and label B is assigned otherwise. Nonetheless, in many cases, adjusting the decision threshold can lead to better real-life outcomes.

Motivation

To understand the motivation behing adjusting the decision threshold, consider a model that aims at detecting a serious medical condition with two classes: positive (indicating the patient has the condition) and negative (indicating the patient is healthy ). Using a decision threshold of 50% might result in a significant number of false negative test results, meaning the probabilistic model indicates a patient has less than a 50% chance of having the condition when they actually do, potentially leading to serious consequences on the patient’s health. In such situations, it becomes imperative to opt for a lower decision threshold, even if it increases the false positive rate. Depending on the situation, patients can be considered positive even with probabilities as low as 20% or 10%.

Now, consider now that you are managing a support call center and an AI model is able to predict whether your customers may face a known minor issue with a product. Considering cost, you may decide to proactively reach only customers who have a predicted probability of facing the issue of at least 70% or 80%, and provide support to others only if they contact support, at which point you’ll know they are indeed facing the issue.

As these examples illustrate, the value of the decision threshold direclty impact the rates of false positive and false negative result. A trade-off is often necessary and the optimal threshold depends on the cost and consequences associated with each type of error.

Implementation

A situation where a custom decision threshold is often required arises when constructing a classifier based on a logistic regression model. This can demonstrated using the Pima Indians Diabetes dataset from UCI Machine Learning, which is available on Kaggle¹. The dataset comprises diagnostic measurements used to predict whether a patient has diabetes. The two classes are represented by 1 (indicating the patient has diabetes) and 0 (indicating the patient does not have diabetes).

Using Python, the first steps consist of loading the dataset, defining the predictors and the label, and creating both a training dataset and a test dataset.

# Importing the dataset
import pandas as pd
df = pd.read_csv("diabetes.csv")

# Creating feature and label vectors for training and test datasets
from sklearn.model_selection import train_test_split
Xtrain, X_test, y_train, y_test = train_test_split(df[df.columns[:-1]], df[df.columns[-1:]], test_size=0.2, random_state=50)

A model is then instantiated and fitted on the training dataset using the Scikit library².

# Instantiating the model
from sklearn.linear_model import LogisticRegression
model = LogisticRegression(random_state=50, solver='lbfgs', max_iter=1000)
# Fitting the model using training data
model.fit(Xtrain, y_train)

Once this is done, a decision threshold is defined and probability predictions are computed on the test dataset.

# Defining decision threshold - In this case, the decision threshold used in 30%
decision_threshold = 0.3
# Predicting probability for the test dataset
y_score = model.predict_proba(X_test)

You can then compute the confusion matrix, which will be specific to the decision threshold set at the previous step.

# Creating and displaying confusion matrix
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
cm = confusion_matrix(y_test.to_numpy()[:,0], (y_score[:,1]>decision_threshold).astype(int),labels=[1, 0])
disp=ConfusionMatrixDisplay(confusion_matrix=cm,display_labels=[1,0])
disp.plot(cmap="viridis")
disp.im_.set_clim(0, len(y_score))

Metrics of interest can also easily be obtained, for instance the sensitivity, or True Positive Rate, and the specificity or True Negative Rate.

TP=cm[0,0] # True positive test results
FN=cm[0,1] # False negative test results
FP=cm[1,0] # False positive test results
TN=cm[1,1] # True negative test results
print('Sensitivity : {}%'.format(round(TP/(TP+FN)*100,1)))
print('Specificity : {}%'.format(round(TN/(FP+TN)*100,1)))

This code returns :
- Sensitivity : 67.9%
- Specificity : 70.3%

It is also possible to compare the confusion matrices for different decision thresholds.

# Comparing 3 different decision thresholds
DecisionThresholds = [0.3,0.5,0.7]

import numpy as np
import matplotlib.pyplot as pyplt
fig, ax = pyplt.subplots(1, 3, figsize=(15, 5.5))
fig.suptitle('Confusion matrix for different decision thresholds', fontsize=15)

for i in range(len(DecisionThresholds)):
    threshold=DecisionThresholds[i]
    cm = confusion_matrix(y_test.to_numpy()[:,0], (y_score[:,1]>threshold).astype(int),labels=[1, 0])   
    ax[i].set_xlabel('1                               0\n  Predicted label', fontsize=12)
    ax[i].set_ylabel(' True label \n0                               1', fontsize=12)
    ax[i].set_title('Decision threshold = {}'.format(threshold))
    ax[i].matshow(cm, cmap="viridis",vmin=0, vmax=len(y_score))
    ax[i].set_xticks([])
    ax[i].set_yticks([])
    for j in range(2):
        for k in range(2):
            ax[i].text(x=j, y=k,s=cm[k, j], va='center', ha='center', bbox=dict(color='white'))

As expected, the number of false negative increases with the decision threshold — which can be problematic - while the number of false positive decreases. This can be visualized using the following code.

# Creating an array of decision thresholds
import numpy as np
ThresholdVector = np.arange(0, 1.01, 0.001)

# Computing the corresponding test results
TPVector = np.empty(0)
FNVector = np.empty(0)
FPVector = np.empty(0)
TNVector = np.empty(0)
for threshold in ThresholdVector:
    cm = confusion_matrix(y_test.to_numpy()[:,0], (y_score[:,1]>threshold).astype(int),labels=[1, 0])
    TP=cm[0,0] # True positive test results
    FN=cm[0,1] # False negative test results
    FP=cm[1,0] # False positive test results
    TN=cm[1,1] # True negative test results
    TPVector = np.append(TPVector, TP)
    FNVector = np.append(FNVector, FN)
    FPVector = np.append(FPVector, FP)
    TNVector = np.append(TNVector, TN)

# Ploting the results
import matplotlib.pyplot as plt
plt.plot(ThresholdVector, TPVector,color = '#9200D2',label = "True positives")
plt.plot(ThresholdVector, FNVector,color = '#009698',label = "False negatives")
plt.plot(ThresholdVector, FPVector,color = '#FD3500',label = "False positives")
plt.plot(ThresholdVector, TNVector,color = '#1240AA',label = "True Negative")
plt.legend(bbox_to_anchor=(1, 0.85))
plt.xlabel("Decision threshold")
plt.ylabel("Result count")
plt.title("Results per category against decision threshold")
plt.show()

Finally, you can verify that a decision threshold of 50% yields the same results as when using the ‘predict’ method from Scikit. This method directly computes label predictions assuming a decision threshold of 50% that cannot be adjusted.

# Classification with 'predict' method, with fixed decision threshold
y_predict = model.predict(X_test)

# Displaying the confusion matrix
cm = confusion_matrix(y_test, y_predict,labels=[1, 0])
disp = ConfusionMatrixDisplay(confusion_matrix=cm,display_labels=[1,0])
disp.plot(cmap="viridis")
disp.im_.set_clim(0, len(y_class))

Conclusion

In summary, this article provides a concise explanation of decision threshold concept in binary classification and illustrates its importance in building a classifier. Though, it is sometimes overlooked, adjusting the decision thresholds is often a simple and efficient method for enhancing operational outcomes in binary classification problems. It serves as a valuable complement to feature engineering and model fine-tuning, in particular in scenarios where different types of errors result in vastly different costs and consequences.

While libraries like Scikit make the implementation of custom decision thresholds relatively straightforward, a solid understanding of this concept is fundamental for grasping more advanced topics related to binary classification, such as the Receiver Operating Characteristic (ROC) curve, the Precision-Recall Curve (PRC), and their associated metrics.

If you found this article helpful, please show your support by clapping for this article and considering subscribing for more articles on machine learning and data analysis. Your engagement and feedback are highly valued as they play a crucial role in the continued delivery of high-quality content.

You can also support my work by buying me a coffee. Your support helps me continue to create and share informative content. It’s a simple and appreciated gesture that keeps the momentum going : Buy Me a Coffee.

References

[1] UCI Machine Learning. Pima Indians Diabetes Database. www.kaggle.com. https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database

[2] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825–2830, 2011.

Machine Learning Tips: Adjusting Decision Threshold for binary classification

What is a the decision threshold ?

Motivation

Implementation

Conclusion

Written by Anthony Demeusy