How can Machine Learning System Help Detect Fraud in Fintech?

Published in

Arteos AI

8 min readMar 15, 2020

Why now?

Financial Technology Services, often called FinTech, represent the business that uses technology to enhance the financial services and processes. The industry is enormous, and one of the driving factors is that many traditional companies and banks are actively investing in, acquiring or partnering with fintech companies to deliver innovations to their business. On the other hand, these industries are suffering a lot from fraud-related losses and damages.

Everything is going digital these days. That also opens new space and channels for financial services distribution but also a productive environment for fraudsters. Approximately, we need more than forty days to detect possible fraud, and clients will, in more than 20%, change the bank after the fraud or other scams are committed. With this in mind, one of the biggest challenges for this industry is how to detect and prevent real-time possible fraud cases.

Traditional rule-based systems in comparison with Machine Learning

In recent years, with all of the digitalization processes and an increase in digital banking and financial services, integration of machine learning into fintech has never been this significant. Switching from rule-based to real-time detection system with machine learning behind it is imperative for most of the industries. So what are the most significant differences between the two?

The rule-based approach — it is all based on evident, business-wise logic and signals. Scenarios were manually written by the analyst and fraud experts. These rules, as you may notice, are too straightforward and cannot be dynamically modified to learn from new scenarios and are unable to detect implicit correlations. Apart from this, legacy software is, in most of the cases, not open to real-time data streaming.

ML-based fraud detection — for the non-evident but still possible fraud cases which are hidden and subtle, machine learning can offer algorithms that can process large data stream with a variety of variables. This offers the possibility of detecting hidden correlations between the likelihood of fraud or scam and user behavior. Apart from this, faster data processing is one of the most significant instant benefits in comparison to manually adding new rules.

Although rule-based systems are inferior to ML-driven ones in all the aspects, they still dominate the market.

One of the good examples of innovative companies when it comes to machine learning implementation for fraud detection is MasterCard. They integrated machine learning and AI to track and process such variables as transaction size, location, time, device, and purchase data. They have real-time validation of clients’ behavior and provide validation and judgment of possible fraud. The recent study shows that merchants lose about $118 billion per year, while clients’ loss is about $9 billion per year. Fraud prevention is a strategic goal for the banking and payments industries for sure.

Banking and credit card payments

One of the first areas which were affected by the digitalization was the payments. On the other hand, they are the most vulnerable ones. Furthermore, mobile payments and competitive market approach for the best customer experience, swipe the number of verification steps. Modern fraud detection systems can solve a wide range of analytical problems and detect all the possible scams in the financial streams.

Data credibility assessment - elimination of human errors in the reconciliation of paper documents and system data. Gap analytics can also help to detect missing values and anomalies in the sequence of transactions. The bottom line is that data credibility is reached using public sources and history of transactions.

Duplicate transactions - you may already hear about this case. If not, it is one of the most common ones. The fraudster creates transactions close to the original ones and multiplies it n times. The example is that a company tries to commit a scam and charge a client a couple of times with the same invoice by sending it to the two branches with different locations. Rule-based systems are slow and often fail to detect this behavior promptly. The result is that the counterparty can pay twice or buy double the amount of goods. Machine learning systems can distinguish between human errors and suspect duplicates.

Account theft and unusual transactions - when we speak about fraud, the first thing in our mind should be behavior analysis of our clients during transactional processes. For example, our client visits a store at 8–9 pm every night. It is rarely close to his house, and the average payment is around 40 euros. Every two weeks, our client also drives to the gas station. Imagine that we receive information about the anomaly in clients’ behavior that there is a request, 100km away from clients’ homes, for money withdrawal of two thousand euros. The system is detecting this behavior as possible fraud and alerts the client on his phone with verification requests.

Loan applications

When we speak about landing and fraud cases, it all comes to personal information. Ten or twenty years ago, getting someone’s personal information was almost mission impossible. These days, almost all data can be found on the internet, social networks like Facebook, Instagram, Linkedin, or even TikTok. Getting client’s data such as IDs, lates photo, address, phones are now in the click of the mouse. All of this makes the life of a bank very difficult. With smarter fraudsters, we have to have smarter solutions for detection and prevention but also not to overcomplicate the process of getting the loan because our clients want to get the money as soon as possible.
So we can conclude that a common type of fraud is, in most cases, bases on providing false personal details or information. We can fix this problem in the following ways.

Customer relationship with the bank is essential, and reviewing this history is crucial. Apart from this, looking for inconsistencies and quickly verifying record fields are very important
Scoring models calculate the possibility that the fraud is committed and grade it against a standard scale. In this way, we can detect applications that are more likely to be fraudulent. With machine learning, we can classify these applications into groups and also minimizing the cost by reducing the time needed by concentrating only on the risky loan groups. Finding a proper distinction between fraudsters and problematic borrowers ensures better credit statistics overall.

Machine Learning Solutions: Common vs Advanced systems

In the upper part, we describe the most common fraud scenarios. In this section, we are putting a light on how machine learning algorithms can be built to help the banks and fintech companies in general and what are the most common approaches to creating such engines.

Anomaly detection to reveal suspicious transactions

One of the most common approaches when it comes to fraud detection is anomaly detection. Ast the fraud cases represent an anomaly in the whole bank portfolio, using an algorithm which can detect these cases are expected the first choice. It is based on a classification approach where we have data with normal distribution behind it and outliers. In this case, outliers are transactions that deviate from normal ones and are flagged as potential fraud.

Now comes the obvious question. What data should I use? Variables are a various-from range of transactional details to locations and images. The idea is to get an answer to some of these questions:

Expectedly do clients access services?
Do we have typical user actions?
How atypical are monitored transactions?
Did the user provide inconsistent personal information?

This is maybe the most direct way of attacking this problem, with simple binary answers and questions. If the transaction is suspicious, the system asks for more validation, etc. This approach is very good at supporting traditional rule-based systems.

Advanced fraud detection systems

Apart from detecting anomalies, advanced machine learning systems can detect patterns in clients’ behaviors and signals that can lead to early detection and prevention of fraud scenarios. Two most common machine learning groups or approaches which are used into fraud detection systems are:

unsupervised machine learning algorithms — detecting the hidden relationship between the unlabeled data and classify them in newly created clusters
supervised machine learning — training the algorithm using already labeled data from the historical events. The main goal is to train the model to predict, based on provided data, future cases of fraudulent behavior

Best results are given when we have a synergetic effect of the two.

Labeling data with unsupervised machine learning algorithms — it’s tough for humans to classify new and sophisticated fraud attempts by their implicit similarities. The unsupervised learning approach is the best way to classify data items into clusters having information about all hidden correlations. This boosts our data labeling phase, increase accuracy, and a lot more.
Training a supervised machine learning algorithm — when our data is labeled correctly, we can now apply some of the supervised models to train the algorithm to detect and classify potential fraudulent transactions correctly.
Ensembling models — Ensembling multiple different models is a common approach in data science. Models analyze the same transaction or client behavior and then “vote” to make a final decision. This way, we are leveraging the strengths of multiple different methods and reducing the cons.
Setting an express verification — Split the verification into two steps. The first one should be simple anomaly detection or other machine learning algorithm, which divide the dataset into two groups, regular and possibly fraudulent. Then, more sophisticated approaches should be applied only to suspicious ones. This approach is computationally efficient, and it increases the accuracy of our model.

Conclusion

How to pick the right approach and machine learning algorithm? Not an easy task at all. It depends on the problem type, available data, resources, and internal knowledge. Starting today, fraud detection and prevention systems in the banks and FinTech companies reach the following standards:

detect fraud in real-time fraud detection and prevention
improve data credibility should be of unquestioned quality
analyze user behavior is one the most important stepping stones for the quality of possible fraud detection
detect hidden correlations and behavior patterns with machine learning algorithms

Be aware the, even though machine learning can live up to all of these qualities, they have their own drawbacks. The first one is maybe the most important — the data, a large amount of it, and carefully prepared for future training of the machine learning algorithm. The second one is that no matter the sophistication of the machine learning models, it still leans on the rule-based engines, such as checking the legal limit of cash transactions. The third one is that implementation of these solutions, and the very end, requires strong data science shills, internally or with hiring an external vendor to build a sophisticated and robust ensembled algorithm.