Member-only story
The main issue with identifying Financial Fraud using Machine Learning (and how to address it)
Strategies for dealing with imbalanced data
The sheer amount of financial transactions that payment processors deal with on a daily basis is staggering, and only increasing: in the order of 70 million credit card transactions per day in 2012 and with losses in the billions of dollars in 2017. Determining if a transaction is legitimate or fraud is a job exclusively for a computer system simply due to volume. The traditional machine learning approach is to build a classifier that helps the human in the loop to reduce the number of transactions that it has to look at.
The challenge for machine learning classifiers is that the percentage of fraudulent transactions is in the order of 1–2%, which means that classifiers have to consider a severe imbalance in the training data.
This is an awesome video that shows the challenges that machine learning engineers have to go through while systematically detecting fraudulent transactions: