Cost-sensitive classification in fraud prevention (the code)

Published in

Mercado Libre Tech

6 min readMar 11, 2020

In the first part of this two-piece article, we shared the procedure to calibrate a binary classifier that has very unbalanced results in terms of type I and type II errors. In this article, we are going to explore the details of the code behind the formulas in the first part, but before we will recall the concepts exposed in the previous part.

The ROC — Utility calibration

When we train a classifier, at every prediction, we get a score that is taken as a likelihood of the outcome. In general, this is not the case as the training process returns an approximation of the true model. Most of the times we can overcome this approximated result calibrating a model with the ROC curve. We can express the utility of our decision in terms of the true-positive rates and false-positive rates:

Utility formula in terms of true-positive rate and false-positive rate.

This formula tells us that utility is a linear function of the true-positive rates and false-positive rates. Thus, we can express the true positive rates in terms of the false positive rates and utility

True-positive-rate in terms of false-positive-rate and utility U.

We can plot this curve in the ROC space and get:

where the red line indicates the U=0 utility and the green line indicates the max utility we can obtain with our model.

Bayesian minimum risk

If we are pretty confident that our model is returning the true likelihood of an event, we can opt for a bayesian approach. We can compute the utility associated with every action we can take and sort them. Our prediction is the action with the greatest utility associated.

Let’s make an example. Suppose we can take only two actions: accept (U- in the plot) or reject (U+ in the plot) a payment. We can see in this specific case that for just a few values of the predicted value U- is greater than U+ (U- > U+). This is where we must accept our payment.

The code

Now that we have had an overview of what we are talking about, let’s get down to work. I have put together a small python library in order for developers to play around with these concepts. You can find it in github at https://github.com/Tokukawa/cost-sensitive-calibration. The package is pip installable so we can start opening our Jupyter notebook and typing

We wrote the package for python 3 but it should work with 2.7. In case you experience problems during execution, you might as well consider start using python 3. Now let’s create a simple example:

preds is the variable containing our predictions and labels is the variable containing our true labels. Say our utility matrix is like this:

The utility matrix must be expressed in per dollar return, which means that -1.0 is the same as -100% loss, +0.5 means +50% gains and so on. Now we can perform the ROC-utility calibration in this way:

In the example provided the result is like this:

Optimal Threshold:0.316255844096Max Utility: 0.074900

Setting “plot_roc=True” will allow us to show the roc curve:

The red line represents the U=0 utility and the green line the max utility you can reach with this model. In case the green line is below the red line, this means that your model is not good enough and you should find a better model. So, what is happening under the hood? In the class “BinaryCalibration” there is the following definition of the calibration method

roc_curve is sklearn functions that return the false-positive rate fpr, the true positive-rate tpr at various thresholds th. With this info and the frequency of positive examples we can perform the actual estimation of the optimal threshold. This is happening here:

We measure the distance between the ROC curve and the utility U=0 for the same fpr values in this line of code:

Where delta is this distance:

Then we find where the bigger value is:

With this information, we can find the value of the max utility max_util and the optimal threshold th[max_delta_index]

And that is pretty much everything for the ROC-utility optimization. Can we use the Bayesian minimum risk with this package? Well, you can but be warned; this is a toy model. You had better use CostCla. So let’s see what happens with the bayesian minimum risk in the example of the lion seen in the previous article. The example was:

...Say you are walking alone in the savannah trying to reach the next village. Suddenly, you hear a noise coming from behind. It is very likely a noise caused by the wind but what if it is a lion lurking in the bushes? If you decide it is a mortal danger and grab your weapon, then two outcomes are possible: if you are wrong, nothing happened except the explosion of the amount of adrenaline in your body and if right, then you have a chance to survive. However, if you decide it is not a mortal danger continuing to walk and you are wrong, you are dead.

With this example, we made the assumption that the utility matrix can be something like this:

We can see what the optimal solution with a bayesian approach is:

With the instance binary_bayesian_classifier, what happens with a likelihood of one billion to one that the noise is a that of a lion?

So as stated in the previous article, the answer is:

Just in case, run!

As we have done for the ROC-utility calibration we can take a look at what happens under the hood. The class “BinaryBayesianMinimumRisk” has a method “predict”:

that computes the bayesian utility for every possible action “np.sum(self.util_matrix * np.array([[pred, 1 — pred]]), axis=1)” and then finds the bigger value “np.argmax(..)”

So, this is pretty much everything. We have seen how to use the class BinaryCalibration that implements the ROC-utility optimization and as well as what happens with the details. Next, we have explored how to use the class BinaryBayesianMinimumRisk and what happens under the hood. I hope you have enjoyed this journey as much as I have; Hope to see you in the next article.

Cost-sensitive classification in fraud prevention (the code)

Written by Emanuele Luzio, Ph.D.