Active Learning on Fraud Detection

Miguel Leite
Nov 27, 2019 · 8 min read

The Summer of ‘19

One year before finishing my Master's, I decided to spend my summer learning and gaining experience in the fields I’m focusing on: data science and machine learning.

Now that summer is over, I realize this was an excellent decision. But that’s because I was lucky enough to end up in one of Feedzai’s summer internship programs. Three months have passed since all the interns, who were allocated to a great diversity of projects (not only in data science and machine learning), were cheerfully welcomed by everyone at the office.

Before getting all techy, I must thank Feedzai for providing me with such an amazing experience. I have learned at a remarkably fast pace, and I never felt like I was “just” an intern; instead, I was always treated and trusted like other members of the team. Without further ado, let's get to the point of this post.

One of the things Feedzai did that I admire the most was that they assigned interns to projects of actual relevance. In my case, I joined the Research Data Science team, where I was challenged to study and experiment with the application of active learning to fraud detection.

What is active learning?

Well, imagine an entity that processes payments (e.g., a bank): they likely have hundreds of thousands of clients making millions of transactions per day. You suspect that a very small portion of those transactions is fraudulent — anything between 0.1% and 3% — but you have no idea which ones (for some reason fraudsters refuse to admit they are committing fraud).

Without a machine learning model, there are two main ways to identify which of those transactions are fraudulent:

  • Chargebacks — these occur when clients complain that they didn’t make the transaction and it is assumed that the account or card details have been stolen. Chargebacks would perfectly fix the problem of obtaining labels if it wasn’t for the huge drawback that they could happen after a few days/weeks after the transactions have occurred or possibly even never take place.

If you wish to build a machine learning model to classify transactions, ideally you would leverage both methods.

But which transactions would you send to the analysts? Every single one of them? Poor analysts… Especially since 90% of their time, at the very least, would be spent reviewing similar transactions. In effect, there would never be enough people for such a task. So, how can you pick which transactions analysts should be looking at? How can we go about creating, as we call it, an intelligent review queue?

That’s where active learning comes in. Active learning is the field that covers the following situation:

  • You don’t have any labels at all for your data;

The Flow

It’s important to understand how transactions would flow within an active learning framework. Let’s analyze the components in the following diagram:

Fig. 1: The Active Learning Flow
  1. The client’s transactions happen in real-time and are represented by the transaction stream;

Disclaimer: In such a preliminary phase of this research, we can’t actually send transactions to the analysts because we simply can’t afford all that time from analysts for each experiment. Given that we have the labels of all the data, what would happen is the following:

  1. At first, all transactions are treated as if they are still unlabeled.

Measuring the policy’s performance

We need to find some way to know how accurate the machine learning model is each time the labeled data pool grows (i.e., after new transactions are reviewed by the analysts).

In a real use case, the lack of labeled data keeps us from having a test set to evaluate model performance. Nevertheless, we are still in a very early stage of the project and we want to know how different active learning policies perform, so we split our dataset in half and, being the data time-dependent, we use the latest split as a test set. Furthermore, in a real situation, we would tune our active learning policy with a dataset from a similar domain, so this analysis is still relevant.

Thus, our experiments are organized as follows:

  1. New transactions come “from the analysts” with labels;
Fig 2: The Experiments Flow

Our Baselines

How well is an active learning policy doing? To answer that question, we must have something to compare it with. So, we established two simple baselines:

  1. The pessimistic baseline — When the active learning policy is simply sending the transactions to the analyst in a completely random manner. We definitely don’t want to do worse than that, right?

In this plot, you can see a run of our pessimistic baseline. The blue line represents the performance of the model throughout the querying process. The dash-dotted horizontal red line is the optimistic baseline. And it appears that the model stabilizes there once it has nearly 4k labels.

Fig. 3: Pessimist and Optimistic Baselines

Not so bad right? The model trained with 4k transactions is as good as the optimistic baseline, which was trained with 150k transactions. But don’t forget this was a random run, which means it could have been worse. It might just depend on how lucky you are.

So, how do we get a visualization of the margin between the worst and best-case scenario? Easy (but computationally expensive)! We run the same experiment you have seen above hundreds of times, with nothing different besides the random seed, and then we plot the distributions of all those plots into a single one.

In the case of random sampling (a.k.a., the pessimistic baselines), the result is this:

Fig. 4: Distribution of Random Policy Runs

What we can deduce from this plot is:

  • The model usually works well when it already has access to 4k transactions, as we had seen in the plot of figure 3.

What we achieved so far

After going through the current state-of-the-art methods in active learning (if you are interested, I recommend this paper to get started), we implemented several active learning methods, some solely based on the current literature and some with a few tweaks of our own.

After a lot of experiments, the best result we achieved is the following:

Fig. 5: Best Active Learning Policy (for now)

We managed to make the distributions much “thinner” and to make at least 83% of the runs stabilize at around 1k transactions.

Yet, there are still a few scenarios where the model obtains a lower score around 2k transactions, which is not yet optimal.

We achieved promising results that make us believe we are on the right path. We have many ideas related to active learning policies that still need to be tested, and we believe that we’ll find a robust one that always achieves stable top performance at 2k transactions (at least!).

I hope this post provides you with some insights on active learning and the methods we used to evaluate its policies.

Thanks, Feedzai

This post is not about Feedzai. However, I cannot end things without congratulating its people for creating such an amazing culture. It’s crystal clear that the company cares about the well-being of all its employees, and despite the fact that everyone is free to manage their work in a way that best works for them, you can sense that Feedzaians share the same grit and desire to tackle fraud and push Feedzai to even greater heights.

Fig. 6: Every intern and our mentors

Feedzai Techblog

Welcome to Feedzai Techblog, a compilation of tales on how…