Credit card fraud detection with synthetic data and AutoML
Part 1: Investigating fraudulent transactions in real-time with atoti
Credit card fraud figures were boosted by the COVID-19 pandemic, making it more vital than ever to be able to detect credit card fraud quickly and effectively. We split this article into two parts:
- Investigating fraudulent transactions from a business user’s point of view
- Behind the scene technical implementation of the solution
Before we go further, let’s take a look at the rationale behind the need for credit card fraud detection in a real-world situation.
Increasing use of credit cards leads to a greater number of fraudulent transactions
With the frequent lockdowns in these two years, more people turned to online shopping. Companies accelerated the digitization of their businesses as a workaround. We saw the number of e-commerce sales spike in 2020 and it’s predicted to continue growing.
Online transactions meant more credit card transactions: even PayPal can be linked to a credit card. Moreover, contactless payment at retail outlets has become highly encouraged. We can pay with the credit card’s “tap and go”; or we can pay with other payment services such as Alipay, Apple Pay, GooglePay, GrabPay, Shopee Pay and they are usually linked to the users’ credit card.
Credit card fraud usually involves unauthorized transactions. Stolen credit cards, compromised accounts due to database security lapses or phishing that result in account takeover, or application fraud-these are just some means fraudsters use to commit the crime.
Zero Liability Policy to safeguard consumers against credit card fraud
Did you know that major credit card networks such as American Express, Discover, Mastercard and Visa have a zero liability policy provisioned against unauthorized transactions? If fraud is discovered or reported promptly, the cardholder will not be held responsible for its charges. It’s good to check the fine print on the conditions though.
Fraudulent charges must be reported within 60 days of receiving the billing statement with the suspicious charge. However, financial institutions have their own artificial intelligence that monitors our accounts for suspicious transactions. You have probably received an email or a mobile alert asking if a recent transaction is familiar or received a call from a bank staff about an overseas purchase.
But this isn’t foolproof. Ultimately, someone still has to bear the cost of the fraud, whether the consumers, the merchants or the issuers. Let’s look at how we can make use of machine learning to detect potential fraud.
Real-time transactions monitoring and fraud investigation
While we do not have actual transaction data, we used synthetic data with fraud patterns spread across different consumer profiles. In our case, we have the fraud indicator that helps to identify fraudulent transactions. However, in real life, fraudulent transactions do not have such labels.
So how do we identify them?
There are two parts to this:
- Using PyCaret, an AutoML library to create an ML model to detect the fraud
- Using atoti, a Python BI analytic tool to translate the prediction into business metrics
All incoming credit card transactions go through the trained machine learning (ML) model for fraud detection in batches (to replicate incoming transactions for real-time demonstration). We upload the transactions along with their prediction and scoring from the ML into atoti. You can read more about this in part two of this article.
Getting real-time fraud statistics
Tada! See the incoming transactions get translated into the various metrics in real time!
We have a global view of the number of fraudulent transactions predicted by machine learning. Statically speaking, the F1 score is pretty consistent around 0.9 and the recall is around 86% which is acceptable. These statistics are pretty in sync with the value from the tuned model. Refer to part 2 of the article to see the statistics of the trained models.
From above, we see that the amount involved in fraudulent transactions is higher for women above 55 years old. Similarly, the youngest and oldest populations are more susceptible to fraud. Perhaps we could start a fraud prevention campaign for these populations.
Most fraudulent transactions occur on weekends and at night.
Investigating from the customer’s perspective…
In atoti, we are able to design a dashboard that reflects all the incoming transactions. By applying filters in the dashboard, we are able to investigate higher-risk transactions.
From the above dashboard, we can see that this particular customer has zero suspicious transactions in July but has 15 such transactions detected by machine learning. The transaction amounts involved are also on the high side.
It seems that the fraudulent transactions start occurring in August and the transaction amount has a growing trend.
If we include the “presumed” non-fraudulent transactions, there are more transactions in August than in July.
Zooming in on the largest transacted amount, we can see that it’s spent on travel. It is plausible that the increase in spendings is due to an upcoming trip.
In any case, we can still call the customer to verify if these transactions are legitimate. After all, there’s nothing wrong with being on the safe side.
Since we have the actual fraud label in our dataset, we can also drill down on the fraudulent transaction to see if it’s a true fraud.
Ways to prioritize transactions
1 — Color coding transactions to give a high-level view of the urgency
Using color coding, red cells show transactions with a higher probability of being true fraud. Orange cells, in our case, show transactions with a monetary value greater than $500; yellow cells for transactions within $100 to $500. In a quick glance, we should be able to have an idea of the number of transactions that have to be addressed.
Likewise, we highlighted the distance between the customer and the merchant in pink if it’s further than 85km. Rare transactions that occurred further away are more suspicious as customers are unlikely to go out of their way to make a purchase. Though it might not be true in the current pandemic situation, where most people carry out online transactions from overseas portals.
In any case, we took the mean distance in this case as a gauge. You are free to decide the appropriate distance to flag warnings.
2 — Sorting by monetary value to prioritize transactions
By sorting the transaction amount in descending order, we have access to those transactions with the highest monetary value. We can pick the top transaction with its scoring highlighted in red for investigation.
Investigating from the merchant’s perspective
Let’s switch the page to the “Suspicious transactions (Merchants)” tab.
Well, it’s not really necessary to have the dashboard displaying transactions in real time for merchants. But just to let you know, atoti is able to reflect the data upfront almost instantly upon loading.
We can zoom into a specific merchant chain to see if there are certain outlets that are having higher fraud counts than other outlets.
Also, we can see if a certain consumer has multiple fraudulent transactions with the same merchant. If we have refund information in the dataset, we could even see if the same consumer has repeated refunds, which may suggest chargeback fraud.
In the event the business management team is interested to see how these outlets are spread across geographically, we have enough information to provide this information upfront (without having to get developers involved!):
To be continued…
Hopefully, this article has given you some inspiration on the type of analytics we can have with credit card datasets, along with machine learning to detect fraud.
Read on to learn more about how we implement this solution in part 2 of the article.