Fraud Detection Modeling User Behavior

Alex Ravaglia
Data Reply IT | DataTech
8 min readDec 21, 2022

Introduction

In this article, I am going to show how to approach the problem of fraud detection by basing the analysis on customer behavior. Starting from some credit card transactions it is important to generate the data to find the behavior of a customer to highlight some important patterns in the data. Once the behavior is extracted it is possible to train a machine learning model that will be able to understand if a bank transaction is legitimate or not.

Why modeling user behavior

A fraud detection task could be resolved at the transaction level by analyzing each transaction as a stand-alone, in this way the assumption could be that fraud transactions are more similar to each other concerning legitimate ones and vice versa.

But users can behave very differently and legitimate transactions can vary depending on the user. In order to detect fraud is more interesting putting into relation the transaction of a user with his old operations instead of relating it with others users’ activities. Modeling user behavior will let us move from “detect if the incoming transaction is a fraud” to “detect if the incoming transaction is a fraud for the user XYZ”.

Not all users are equal

The assumption is that each person will act according to his habits. In this way, it will be possible to find some recurrent patterns in his actions. At a certain point, if some actions are very different from the usual behavior, it could be that maybe is not him and it has been frauded. Is more intuitive to analyze if an action is like the user’s old operations instead of not relating the event with it.

How to model user behavior

Once we understood why it is important to train the model on user behavior, we need to find a way of how it is possible to extract this information from the data and use it. First of all, it is necessary to group all the transactions at the user level and summarize them to extract statistics (e.g. mean and mode ) that will give a representation of some habits. This information is interpolated with the incoming transaction and the result will be represented as a deviation from classic behavior.

The process of computing statistics at the user level and relating it with the incoming transaction could be done with a feature-engineering process. The new computed features are added to the target transaction and are extrapolated from the relation between the target operation and the other user’s transactions.

In this way, each enriched transaction will embed 2 types of information:

  • Operational data of the transaction
  • How the transaction is related to the other user’s operations

Here is an example of the information that can be extracted from the operational data of the transaction:

The operation is done by user XYZ in Milan at 3:00 a.m. from an iPhone spending 10,000$.

This is, instead, an example of information that can be extracted from the relation between the operational data and the user’s history:

The operation is done in Milan by the user XYZ that lives in London. The device is an iPhone, but usually, the user uses an Android. The operation is done at 3:00 a.m. and in the last 6 months, he never spent money from 2:00 a.m. to 6:00 a.m. The transaction has a total spend value of 10,000$ while the mean amount for each transaction of the user in the last 30 days is 50$, and in the last 7 days is 30$.

Feature engineering for capturing the user behavior

The computation of each new feature is done by aggregating the transactions at the user level and each one will capture an aspect of the user behavior. In order to design the typology of the feature that will be generated, three macro groups of analysis have been identified:

  • On time-window
  • On the frequency of the activity
  • On the deviation from the mean value

Each new feature will capture an aspect of user behavior. It will generate 19 new features grouped on scope similarity.

Frequency of user’s activity

These features show the frequency of activities also related to the amount involved. Similar amounts are grouped into 3 spending ranges. These features could detect frequency anomalies.

  • Number of transactions in the last month/week
  • Number of transactions in the last month/week with the amount in range 1/ range 2/ range 3

The amount involved in the transactions

These features show the deviation of the transaction amount concerning normality. It could detect anomalies related to high expenses.

  • The amount of this transaction is over a +” X ” % compared to the mean amount of the last week/month

Credit card | device | seller & purchaser Information

These features show if some of the operational information of the transaction is the same as the old user’s transactions. It could detect variations in some habits related to devices, credit cards, or seller and purchase information.

  • Has it been used the habitual OS device?
  • Has it been used the habitual browser?
  • Has it been used the habitual type of device?
  • Has it been used the habitual model/brand device?
  • Is the reference of the seller/purchaser the habitual one?
  • Check on the value of 2 different features (“card4” & “card6”) if they always have the same value

Important disclaimer about the relation between the newly generated feature and fraud occurrences

It is very important to know that we are NOT telling the model that if the device of the transaction is not the habitual one or the amount in $ is much more than the average amount, therefore it is a Fraud. We are just enriching each transaction with some data that puts it in relation to the user’s history. IT WILL BE THE MACHINE LEARNING MODEL that will infer during the training process which of the added feature is more relevant to detect Fraud.

Training and generalization of the model

We want to train the model so that it can recognize if the incoming transaction of a specific user is a fraud or not. We split all the transactions of each user in the Training and Test set. Doing so, the model will be trained with the data of all the users and the system will be good to classify the transactions of the known customers, the ones whose data has been used during the training process.

Is it possible to extend the inference to new users?

Yes, since the classification is not specific for the user. The model is based on the relation between the target transaction and all the previous user’s operations, therefore it doesn’t consider any kind of personal user information in the process. The reason is that we are enriching the target transaction by adding the information of this relation, this way the model will infer legitimate or fraudulent transactions based on that new information. This means that the model will understand which are legitimate and not authentic correlations regardless of the user’s private details.

Let’s make an example:

UserA usually spends 1000$/transaction a week in Paris. The incoming operation is done in Paris for an amount of 1170$.

UserB usually spends 30$/transaction a week in Rome. The incoming operation is done in Rome for an amount of 40$.

UserC usually spends 200$/transaction a week in Berlin. The incoming operation is done in Paris for an amount of 1000$.

UserA and UserB are very different, one operates in Paris and usually spends a lot, the other in Rome and spends way less money. The feature engineer process highlights the relationship between the transaction and the user’s behavior. It is not focused on the user’s specific information itself, this means that the relation of the data of the userA will be similar to the relation of the userB because, for both of them, each incoming transaction is quite similar to the habitual behavior. The userC did an operation that is quite similar to the one done by userA, but unlike him, for userC, this could be an anomaly because it is quite far from his habits.

Thanks to the feature engineering process we try to capture the different types of fraudulent and legitimate relations that could exist. The process is not user-centric but is more focused on the relation of the transaction with the others. For this reason, the splitting of the dataset could be done with a % of the user that will populate the Training set and the others in the Test set. In this way, the model could be generalized and be able to classify also transactions of new users.

Of course, for each new user, it is necessary to collect historic data.

Dataset composition and ML model

The dataset is from Kaggle and it is provided by Vesta. Each bank’s transaction is composed of many features but due to privacy reasons, only a small percentage of these features is human readable and reliable for our purpose. Unluckily, each transaction does not have an identifier of the user, so according to this work, that information is generated. The dataset does not contain fraud transactions, and the labels of the frauds are added artificially. The idea is that one fraud is an operation that belongs to a user that did not perform that action.

The ML models trained are:

  • Decision Tree
  • XGBoost
  • Random Forest
  • Logistic Regression

Results

The metrics used for the evaluation of the model are Precision and Recall. Each model has been trained both with normal and enriched transactions (the addition of the new features that capture the user’s behavior).

This is the result reached without the new features:

Models trained just on the transactions as it is

This is the result where each transaction contain also the new features:

Models trained on the transaction with the feature related to the user’s behavior

The best results obtained by XGBoost, Random Forest, and Decision Tree are almost equivalent. It is clear that feature engineering is a key process.

Conclusion

The results reached with the model trained on the enriched transactions are in many ways better than the others, obtained with the model trained on normal transactions. The decision boundary reached thanks to the new feature permits to have great results. The reason is that interpolating an incoming user operation with his behavior permits to add relevant information and brings high value of precision and recall. Due to the nature of the manipulation done during the dataset preparation, could be that the shown results are not reflecting 100% the reality. This is not relevant because the focus should not be on the value of these metrics but instead on the gain that is possible to reach when it is involved a feature engineering process that captures the user’s behavior.

I hope that the article was interesting, thanks for reading, and if you liked it, leave a clap 👏

Here is another article I wrote about Imbalanced classification in Fraud Detection

--

--