How Machine Learning Helps With Fraud Detection

By Gerry Carr

Published in

RTInsights

6 min readOct 22, 2016

Fraud detection with machine learning requires large datasets to train a model, weighted variables, and human review only as a last defense.

With advances in computer technology and ecommerce also comes increased vulnerability to fraud. Hackers are continuously finding new ways to target undeserving victims, from stolen credit card details to false accounts. Any business or individual who uses online payment sources is open to fraud.

In 2015, financial fraud — including payment cards, remote banking and cheques — rose a staggering 26 percent from the previous year, totaling a cost of £755 million. It was the fourth consecutive year that has seen an increase in this area. It is the most common crime in the UK, with 2.47 million offenses reported in 2015–2016 alone. The multi-million pound cost of fraud is proving detrimental to online businesses who foot a high proportion of the bill in chargebacks.

In 2015 all fraud types had risen with much owed to the growth of impersonation and deception scams, as well as sophisticated online attacks such as malware and data breaches.

Why transaction rules aren’t enough

The traditional approach to tackling this problem is to use rules or logic statements to query transactions and to direct suspicious transactions through to human review. While there is some variation, it is notable that over 90 percent of online fraud detection platforms still use this method, including platforms used by banks and payment gateways. While this is effective to some degree, in cases where there is a sufficient gap between an order being received and goods being shipped, it is also incredibly costly and far slower than alternatives.

The “rules” in these platform use a combination of data, horizon-scanning and gut-feel. The system is backed with manual reviews to confirm experts’ decisions. If we take the recent reports of an abundance of Turkish credit cards available on the darknet due to the publicized data breach in Turkey: businesses recognize the increased risk of Turkish cards as fraudulent and can simply add a rule to review any transactions from Turkish credit cards

Following this, every attempted purchase made by such a card raises an alert and is declined or reviewed. However, this raises two significant issues. The first is that such a generalized rule may turn away millions of legitimate customers, ultimately losing the business money and jeopardizing customer relations. Secondly, while this can deter future threats after such fraud has been found, it fails to identify or predict potential threats that businesses are not aware of.

These rules tend to produce binary results, deeming transactions as either good or bad and failing to consider anything in between. And until the rules are manually reviewed, the system will continue to prevent such transactions as those from Turkish credit cards, even if the risk or threat is no longer prominent.

When human review fails

Criminal gangs also use malware and phishing emails as a means to compromise customers’ security and personal details. Once obtained, fraudsters will use these details to access customer accounts or to commit fraud. These methods all aim to compromise customers’ personal and financial details, including card data, in order to enable the criminals to commit fraud. Here, the card data used is legitimate but is not being used under the consent of the owner. In these cases, such rules would fail to block transactions.

According to the 2015 Merchant Risk Council (MRC) Global Fraud Survey, merchants typically manually review 10–15 percent of online orders. Other reports suggest that as many as 26 percent of ecommerce orders are manually reviewed. This is prime example of how human endeavor is failing to use and enhance computing efficiency.

Machine learning for fraud detection

Machine learning has been recognized as a successful measure for fraud detection. A great deal of data is transferred during online transaction processes, resulting in a binary result: genuine or fraudulent. Online businesses are able to identify fraudulent transactions accurately because they receive chargebacks on them. However, this happens after the transaction has been processed and therefore is reactive, not proactive.

Machine learning works on the basis of large, historical datasets that have been created using a collection of data across many clients and industries. Even companies that only process a relatively small number of transactions are able to take full advantage of the data sets for their vertical, allowing them to get accurate decisions on each transaction. This aggregation of data provides a highly accurate set of training data, and the access to this information allows businesses to choose the right model to optimize the levels of recall and precision that they provide: out of all the transactions the model predicts to be fraudulent (recall), what proportion of these actually are (precision)?

Once the accuracy of the models is deemed acceptable it is time to start predicting, but where do such predictions come from?

Within the datasets, features are constructed. These are data points such as the age and value of the customer account, as well as the origin of the credit card. There can be hundreds of features and each contributes, to varying extents, towards the fraud probability. Note, the degree in which each feature contributes to the fraud score is not determined by a fraud analyst, but is generated by the artificial intelligence of the machine which is driven by the training set. So, in regards to the Turkish card fraud, if the use of Turkish cards to commit fraud is proven to be high, the fraud weighting of a transaction that uses a Turkish credit card will be equally so. However, if this were to diminish, the contribution level would parallel. Simply put, these models self-learn without explicit programming such as with manual review.

Such features in machine learning-based systems make it possible for fraud analysts to identify the most significant contributors. Feedback from users to confirm the system’s decisions by marking customers as genuine or fraudster improves the machine’s learning ability, adding to accuracy.

Three steps to predict fraud using machine learning

Extract features from a dataset.
Provide training set.
Build models.

In questioning the practicality of building the models, note that models can be reused from similar data. Example: A model used for one retail ecommerce site will also work for another, albeit with minor adjustments of features, thus meaning the implementation of such a model is relatively quick.

Merchants who use such systems are continuously assessing customers for fraud, recalculating a score with every action that occurs on their website or app. If an action is deemed to breach the fraud probability threshold, the transaction will be blocked.

In order to lessen, and hopefully defeat, the onslaught of fraud, businesses must improve the accuracy and speed of their decisions on fraudulent threats. The only mature technology available to achieve this is machine learning. Manual review is most effective as the last defense against fraud: While it can be invaluable, particularly in cases where there is no substitute for human insight, manual review works best to help fine-tune machine learning models decisions and aid their detection of changing patterns of fraud, rather than being the sole fraud detection process. Information from detected fraud threats needs to be used to enhance the next generation of automated fraud prevention, rather than replace it.

How Machine Learning Helps With Fraud Detection

By Gerry Carr

Why transaction rules aren’t enough

When human review fails

Machine learning for fraud detection

Three steps to predict fraud using machine learning

Related:

Real-time fraud detection: in your next pizza order

Using Apache Spark Machine Learning for Pattern Detection

How AI and Cognitive Science Can Beat Addiction Treatment Fraud

Written by RTInsights Team