Fraud Detection in the Banking Industry and the Significance of Machine Learning

Published in

Engineered @ Publicis Sapient

5 min readAug 27, 2019

Banking Fraud has been an ever-growing issue with huge consequences to banks and customers alike, both in terms of financial losses, trust and credibility. As per the Nilson report, it is anticipated that card frauds alone would amount to a whopping $30 billion worldwide by 2020. Also, with the technology disruption in both banking and payments (due to a plethora of payment channels — credit/debit cards, smartphones, kiosks), the number of transactions has increased exponentially in recent years. Fraudsters have also become extremely smart, adopting innovatory fraudulent tactics. As a result it has compounded the problem.

What are the various fraud detection methods employed in the banking industry?

Banking Fraud is classified among different financial frauds and broadly includes:

Credit Card Fraud
Money Laundering Fraud
Mortgage Fraud

Primarily, most banks employ Rule-based Systems with manual evaluation for detecting fraud. Although these systems were doing a pretty decent job, in the recent years, they have become more inconsistent. That’s because new fraud patterns are evolving rapidly and these systems are unable to evolve accordingly, allowing frauds to go undetected, and resulting in huge financial losses. There are banks which also have systems built on RDBMS, but their performance is even worse when compared to Rule-based Systems.

Considering all these challenges and shortcomings, Machine Learning can play a vital role in effective and efficient fraud detection in the banking industry.

What are the various Fraud Detection Algorithms?

Data Mining and Computational Intelligence techniques are commonly used in fraud detection. Here, I will be mainly focusing on Credit Card Fraud Detection and talk about the techniques, approaches and lessons learnt, based on our experience in building a Machine Learning application for detecting credit card fraudulent transactions in real time.

Techniques and Approaches:

We implemented and evaluated the following Machine Learning algorithms:

Random Forest
Support Vector Machine
K-nearest neighbor
Neural Networks (Feed Forward)

Model Workflow:

The diagram below depicts a high-level, end-to-end ML modeling workflow we followed. Each pipeline depicts a logical stage with steps in the overall ML modeling life cycle.

Training Pipeline stage: This stage deals with data identification and exploration; output of this stage would be dataset for model training
Model Creation Pipeline stage: This stage deals with Model training; output of this stage would be a trained model that will be leveraged for Prediction
Model Prediction Pipeline stage: This stage is where the trained model would run against the actual dataset, do the prediction, and store the statistics that can be leveraged for reasoning

Logical Architecture

Monitoring Dashboard

Key Challenges and Considerations:

Training Dataset:

Given the data privacy and PII considerations, it would be difficult to source the actual credit card fraudulent transactions. So the training data would have to be handcrafted causing data imbalance. This would create under-Sampling, which would hence result in biases. This would therefore not represent real-world problems accurately.

Model Accuracy:

The metric which is used to evaluate the efficiency of the model should not be accuracy, since the data is skewed towards non-fraudulent behavior. The percentage of fraudulent transactions might be less than 1%.

So, instead, a metrics F1 as depicted below, leveraging precision and recall, should be used:

Accuracy = (TruePositive+TrueNegative)/(TruePositive+TrueNegative+FalsePositive+FalseNegative)

Instead, the metric F1 should be used:
F1 = 2*(precision*recall)/(precision+recall)

Precision = TP/(TP+FP)

Recall = TP/(TP+FN)

Domain Knowledge & ML Techniques:

The right business domain understanding and right ML techniques (algorithms) selection are very important while creating models for efficient and accurate detection of fraud. If done wrong, it would have negative implications.

One Model for All:

One Model for All will not work. We need to have different models for different use cases and hence, having a modeling strategy/framework is imperative for the overall ML success.

Example: Real-time fraud detection and AML.

Model Monitoring:

Every day, you would expect different fraud patterns emerging. This can impact the overall model prediction quality and thereby, the fraud detection. Hence, it is very critical to monitor the models automatically. This would help in retraining the models with new data, capture attributes that help to report exactly the cause of the fraud.

Summary and Observations

We compared the performance of all the 4 algorithms we implemented and found that neural network technique gave the best results. Finally, we created a Rest API in Flask for real-time fraud prediction using the feedforward neural network model we trained. The assessment/benchmarking and comparisons were done against F1 metrics score (using Precision/Recall).
While detecting real-time fraudulent credit card transactions, it is tricky to flag a transaction as fraud, since usually, fraudulent transactions don’t happen in one big transaction, but across multiple smaller transactions over a period of time. In such cases, a fraud score/weightage-based approach should be used. This will help detect transaction-level or account-level frauds more effectively and accurately.

Conclusion

From our learning, we found ML-based systems are better equipped to detect fraud, as they are able to detect and recognize thousands of patterns as opposed to rule-based systems. In the banking industry, we are definitely witnessing more and more banks embrace ML for fraud detection. A model with just the supervised ML algorithms/techniques would not be effective enough to detect fraud efficiently and provide the right, required insights. I believe, a model with the right combination of supervised and unsupervised ML techniques would enable banks better to detect frauds more accurately.

I still believe we have some more way to go when it comes to leveraging unsupervised learning techniques for fraud detection. More detailed research needs to be done with different ML techniques by considering aspects like their performance, hyper-parameter optimization and operationalizations.

Fraud Detection in the Banking Industry and the Significance of Machine Learning

Written by Rajeshkumar I.P