Image from cardozalawcorp.com

Credit Card Fraud Detection

Mohd Amaluddin Yusoff
Nov 1 · 6 min read

An investigation on One-class SVM and Autoencoders

Did you know that global card losses are expected to exceed $35 billion by 2020?

Do I have to worry? Yes! because 65% of the time, credit card fraud results in a direct or indirect financial loss to you, the user. By 2020, we are losing $22.75 billion. This is our hard-earned money!

Let’s dive into two techniques to detect fraud namely one-class SVM and autoencoders. We want to compare their performances in terms of Recall, Accuracy, Precision and F1-score.

The data
The datasets contains transactions made in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions.

Highly unbalanced dataset

The dataset is highly unbalanced, the positive class, Class 1 (frauds) accounts for only 0.173% of all transactions. Because of this, Recall score is given more emphasize than the other scores i.e., we want to capture as many frauds as possible. Of course, in doing this we would mistakenly classify normal and valid transactions as fraudulent.

One-class SVM

One-class SVM model is trained using a single class information unlike the traditional classification problem where the model is trained with labelled data with at least two classes. In this case, the model is trained with Class 0 data, normal transactions of the training data which was 80% of the whole dataset. The remaining 20% is the testing data, reserved to test the model.

In this particular investigation, although we have more than 200,000 data points to train the model, only 50,000 data points were used due to limitation of my computing power.

Here are the performance scores when tested with testing data.

  • Although Recall for one-class SVM is 86%, the precision is only 1.5%. In fact, although it correctly identified 82 out of 95 frauds, it wrongly identified 5542 normal transactions as fraudulent.

Autoencoder

Autoencoder is a neural networks that reconstructs input signal as its output. It has encoder and decoder to encode features from the inputs and decode the features to produce the output.

Image from curiousily.com

Of course, there will be some differences between original inputs and the outputs, the reconstructed inputs (see original and represented mushrooms in the picture above). This difference is called reconstructed error.

In this investigation, 2 layers of encoder of size 32 and 16, and 2 layers of decoder of size 16 and 32 were used.

How to make prediction using an encoder?

Prediction is made based on the reconstruction error i.e., if the error is above a certain threshold the data is considered an anomaly or in this case a fraud. If the error is lower than the threshold, then the transaction is normal. Thus, we have to carefully choose a threshold to achieve good performance.

Let’s first look at error distribution for the two classes, fraud and normal.

Distribution of Reconstruction Error
  • From the error distribution, all values for normal are lower than of for fraud which is intuitively correct, except for the max. The value for max is higher probably because of the highly unbalanced dataset.

Based on the error distribution above, threshold value of 3 was selected and here is the performance of the autoencoder.

  • Although the Recall score is lower than that of one-class SVM, the other metrics are higher.

Since the metrics of the autoencoder depend on the threshold of the reconstruction error, let’s investigate the relationship between the scores and the error.

Ideally we would like to have high precision and high recall. I wonder if achieving this is possible at all for the autoencoder.

We can see from the plot that …

  • Precision and recall depend on the error threshold.
  • As the threshold increases, the recall decreases and precision increases.
  • At a very low threshold, although recall is high, precision is low.
  • At a very high threshold, although precision is high, recall is low.

Can we make prediction using z-score of the error instead of the threshold?

Let’s consider a transaction a fraud when its z-score is smaller then normal’s z-score i.e., we first compute two z-scores based on mean and standard deviation of each error distributions, for both fraud and normal. Then, if z-score for fraud is smaller than z-score for normal, the transaction will be considered fraudulent.

Here is the performance of the autoencoder using z-score, and previous results for easier comparison.

  • The prediction scores obtained when using z-score are similar with the scores obtained using error threshold with threshold value equals 3.

DISCUSSION

  • From the table above, although autoencoders perform better than one-class SVM except for Recall score, their performance in general are similar.
  • To make predictions, One-class SVM is more straightforward than antoencoder to use. This is because an autoencoder reconstructs the input as its output, hence decision has to be made how to use the reconstruction error to make predictions.
  • Comparing the two antoencoder models, their performance is quite similar. It is interesting that the method using z-score of the reconstruction error has similar performance with manually selected threshold error method. This would eliminate the need for experimenting with the threshold values for making predictions.
  • Note that not all data is used to train the one-class SVM model due to high computing power required whereas the entire training data was used to train the neural networks. It would be interesting to see if the performance of the one-class SVM would be different if we use all training data.

CONCLUSION

We have explored credit card transactions data and use it to develop predictive model to detect fraudulent transactions. Two models have been investigated namely one-class SVM and antoencoder. The performance of the models have been compared and discussed. The relationship of Precision against Recall for the autoencoder and their dependencies on the reconstruction errors have been investigated.

The most interesting point from this project is that z-score of the reconstruction error can be used to make prediction for the autoencoder with similar performance compared with the method using a carefully and manually selected error threshold.

More detailed analysis is available at my Github.

Links:
https://www.idexbiometrics.com/top-7-alarming-facts-and-statistics-on-card-fraud/

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade