DATA SCIENCE THEORY | COST AND PROFIT | KNIME ANALYTICS PLATFORM

Finding an Optimal Classification Threshold based on Cost and Profit

What’s the cost of a prediction into the wrong target class?

Maarit Widmann

Published in

Low Code for Data Science

8 min readDec 8, 2021

Co-author: Alfredo Roccato

In this article, we extend the accuracy statistics of a classification model to a concrete profit. We show how to optimize a classification model for credit scoring based on the expected profit as money. The same technique could be be used for predicting churn, detecting criminality, and for other applications where the correct predictions are especially important for one class.

What other strategies can we use to assess the performance of a classification model? Have a look at Confusion Matrix and Class Statistics and Visual Scoring Techniques for Classification Models.

Penalizing and Rewarding Classification Results with a Profit Matrix

Confusion matrix and class statistics summarize the performance of a classification model: the actual and predicted target class distribution, accuracy of the assignment into the positive class, and the ability to detect the positive class events. However, these statistics do not consider the cost of a mistake, that is, a prediction into the wrong target class.

If the target class distribution is unbalanced, predicting events correctly into the minority class requires high model performance, whereas predicting events into the majority class can easily happen by chance. Wouldn’t it be useful to take this into account, and weight the results differently when evaluating the model performance?

Absolutely! However, the final goal of the classification determines whether it makes sense to introduce a cost to certain types of classification results. Cost is useful when incorrect predictions into one target class have more serious consequences than incorrect predictions into the other class(es). Or, put another way, correct predictions into one class have more favorable consequences than correct predictions into the other class(es). For example, not detecting a criminal passenger at the airport security control has more serious consequences than mistakenly classifying a non-threatening passenger as dangerous. Therefore, these two types of incorrect predictions should be weighted differently.

No cost is needed if all target classes are equally interesting or important, and the consequences of a wrong prediction into one target class is as bad as it is for the other classes. This is the case when we predict the color of a wine, for example, or the gender of a customer.

From Model Accuracy to Expected Profit

In addition to accuracy statistics, the performance of a classification model can be measured by expected profit. The profit is measured in a concrete unit defined by the final goal of the classification in practice.

When we use classification results in practice, we assign each predicted class a different treatment: Criminal passengers are arrested, non-threatening passengers are let through. Risky customers are not extended credit, creditworthy customers are! And so on. The most desirable classification results produce profit, such as the security of an airport, or the money that a credit institute makes. We measure this profit in a predefined unit such as the number of days without a terror alarm, or euros. The most undesirable results bring about cost — a terror alarm at the airport, or money lost by a bank — and we measure the cost in the same unit as the profit.

Here, we assess the accuracy and expected profit of a credit scoring model, where predicting the creditworthiness of credit applicants has a consequence in terms of profit (or loss): Refusing good credit can cause loss of profit margins (commercial risk). Approving credit for high risk applicants can lead to bad debts (credit risk).

Optimizing Classification Threshold

Our goal here is to find a classification threshold that maximizes the expected profit. A classification model predicts a positive class score for each event in the data, which determines the final class prediction. By default the events are predicted to the positive class if their score is higher than 0.5, and otherwise to the negative class. If we change the classification threshold, we change the prediction to the positive and negative class. Consequently, the values of accuracy and expected profit change as well.

Data Description

In our credit scoring application, we use the well-known German Credit Data Set, as taken from the University of California Archive for Machine Learning and Intelligent Systems .

The dataset is composed of 1000 customers. The input variables are the individual characteristics of the subjects, like socio-demographic, financial and personal, as well as those related to the loan, such as the loan amount, the purpose of the subscription, and wealth indicators. The target variable is the evaluation of the credit applicant’s creditability by the bank (2 = risky, and 1 = creditworthy).

In this dataset, 700 applicants (70%) are classified as creditworthy and 300 (30%) as risky.

We refer to the risky customers as the positive class and the creditworthy customers as the negative class.

Workflow to Produce Expected Profit for Different Classification Thresholds

The workflow shown in Figure 1 shows the process of training a classification model for credit scoring and evaluating the expected profit for different classification thresholds.

Profit matrix — Figure 1: Workflow to train a classification model for credit scoring and to produce the expected profit for varying values of the classification threshold. Download the Optimizing Classification Threshold Based on Profit workflow from the KNIME Hub.

The workflow starts with data access and preprocessing. Next, to assess the predictive capabilities of the model, it divides the initial dataset into two tables of equal size, respectively named the training set and the validation set. After that, it trains a logistic regression model on the training set to predict the applicants’ creditworthiness as “risky” or “creditworthy”.

The “Profit by threshold” metanode predicts the creditworthiness in the validation set for varying classification thresholds, starting with a low value of the threshold and increasing it for each iteration.

Finally, it shows the model performance statistics for different threshold values in an interactive view as produced by the “Profit Views” component.

Before we look at the results, we introduce the profit matrix that we need to transform the accuracy statistics into the expected profit.

Profit Matrix

To evaluate misclassification in terms of expected profit, a profit matrix is requested for assigning cost to undesirable outcomes.

We introduce a negative cost (-1) to the False Negatives — risky applicants who are approved a credit — and a positive profit (0.35) to the True Negatives — creditworthy applicants who are approved a credit. Table 1 shows the cost and profit values for these classification results in a profit matrix:

The values of cost and profit introduced in Table 1 are based on the following hypothesis [1]: A correct decision of the bank would result in 35% profit at the end of a specific period, say 3–5 years. If the opposite were true, i.e. the bank predicts that the applicant is creditworthy, but it turns out to be bad credit, then the loss is 100%.

Calculating Expected Profit (Baseline)

The following formulas are used to report the model performance in terms of expected profit:

where p is the share of the positive (risky) class events of all data.

where n is the number of credit applicants.

More generally, assuming that the “risky” class is defined as the positive class, an average profit for a classification model with a profit matrix can be calculated using the following formula:

where n is the number of events in the data.

Let’s say we have 500 credit applicants with an average loan of 10 000 €. 70% of the applicants are creditworthy and 30% are risky. Then a baseline for the profit statistics without using any classification model is calculated as follows:

If we approve a credit for all of the applicants, the expected loss is 225,000 €.

Calculating Expected Profit (Varying Thresholds)

Next, let’s calculate what the expected profit is when we evaluate the creditworthiness using a classification model and we weigh the outcomes with the profit matrix.

As described in [2], the minimum threshold for the positive class to achieve non-zero profit can be calculated from the cost matrix as

This value can be adjusted empirically as described below.

Figure 2 shows the workflow inside the “Profit by threshold” metanode in Figure 1. It iterates over different thresholds to the positive class scores and calculates the corresponding accuracy statistics and profit measures.

The threshold values range from 0 to 1 with a step size of 0.01. The workflow produces the overall accuracy for each value of the threshold by comparing the actual (unaltered in each iteration) and predicted (altered in each iteration) target class values. Furthermore, in order to calculate the expected profit, it weighs the classification results from each iteration by the values in the profit matrix.

Figure 3 shows the output table of this workflow:

Figure 3: The accuracy statistics and profit measures for varying classification thresholds.

Every row corresponds to a value of the classification threshold, together with the corresponding model accuracy statistics and profit measures as average profit per applicant, average amount per applicant, and total average amount.

Results

The interactive view in Figure 4 shows the output of the “Profit Views” component. It visualizes the accuracy and profit measures for the varying classification thresholds.

The line plots show how four different model performance indicators develop if the value of the classification threshold increases from 0 to 1. The performance indicators are:

Overall accuracy (line plot in the top left corner)
Total average amount (line plot in the top right corner)
Average profit per applicant (line plot in the bottom left corner)
Average amount per applicant (line plot in the bottom right corner).

Based on an empirical evaluation, the optimal threshold is 0.51 in terms of overall accuracy, and 0.27 in terms of expected profit.

Table 2 represents the target class distribution (top 2 rows) and the overall accuracy and average profit per applicant (bottom 2 rows) of the credit scoring model, using no model (1st column on the left), and the default and optimized threshold values (3 columns on the right):

Using the optimized threshold 0.27, we can reach 0.113 profit per applicant. This gives an average amount of 1,130 € and, based on 500 applicants, a total average amount of 565,000 €.

The undeniable advantage of using a model is justified by the evidence of 565,000 € versus -225,000 €.

References

[1] Wang, C., & Zhuravlev, M. An analysis of profit and customer satisfaction in consumer finance. Case Studies In Business, Industry And Government Statistics , 2 (2), pages 147–156, 2014.

[2] C. Elkan. The foundations of cost-sensitive learning. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence , pages 973–978, 2001.

— — — — -

As previously published on the KNIME Blog: https://www.knime.com/blog/from-modeling-to-scoring