Credit Card Fraud Detection using Restricted Boltzmann Machines (RBM)

Published in

Analytics Vidhya

5 min readMar 6, 2020

Credit Card Fraud Detection

U.S. card fraud (credit, debt, etc) cost $9 billion in 2016 (reported by Nilson which is one of the most trusted sources of global news and statistics about the payment industry) and expected to increase to $12 billion by 2020 this figure is comparable to PayPal’s and Mastercard’s revenue which was only $10.8 billion each in 2017. Relying on old rules-based expert systems to catch fraud, have proven its inability to detect fraud. Fraudsters learn about new technology that allows them to execute frauds through online transactions. Fraudsters assume the regular behavior of consumers, and fraud patterns change fast.

Frauds have no constant patterns. They always change their behavior. Anomaly detection systems bring normal transaction to be trained and use techniques to determine novel frauds.

https://miro.medium.com/max/1000/0*_6WEDnZubsQfTMlY.png

Thus, we need to use unsupervised learning because some fraudsters commit frauds once through online mediums and then switch to other techniques also the model learn from data and not from labels.

Autoencoders and RBM are the two types of deep learning that use normal transactions to detect fraud in real-time. However RBMs outperform due to:

RBMs are good at Handling unlabeled data and extract important features from the input.
Autoencoder learns to capture as much information as possible rather than as much relevant information as possible.

Autoencoders and RBMs

Autoencoders

Autoencoders are neural networks that aims to copy their inputs to their outputs. They work by compressing the input into a latent-space representation, then reconstructing the output from this representation. The network is composed of Encoder and Decoder. Autoencoders consist of an input, a hidden and an output layer.

Autoenocders’ input equals the output in the hidden layer. It reconstructs the error using back-propagation, which computes the “error signal”, then propagates the error backwards through network that starts at the output units by using the condition that the error forms the difference between the actual and desired output values.

https://thesai.org/Downloads/Volume9No1/Paper_3-Credit_Card_Fraud_Detection_Using_Deep_Learning.pdf

Based on the autoencoders, the acquiring bank transfers the input (the amount of money, date and time, location of internet use, and other information), then the autoencoders uses past behavior to be trained first, and then uses the new coming transaction as a validation test for the transaction. Autoencoders does not use labeled transactions to be trained, because it is unsupervised learning.

RBM

Restricted Boltzmann machines (RBM) are a special type of Boltzmann Machines that fall under the category of energy based models. It is a generative stochastic model which consists of two layers of visible and hidden layers, they are connected by symmetrical bipartite graph (a bipartite graph is a graph whose vertices can be divided into two disjoint and independent sets and such that every edge connects a vertex in to one in ). The name “Restricted” came from the fact that there is no intra-layer connection, every node in the visible layer is connected to every node in the hidden layer but no two nodes in the same layer are connected to each other.

https://towardsdatascience.com/deep-learning-meets-physics-restricted-boltzmann-machines-part-i-6df5c4918c15

The design of RBMs is different from other deep learning, because there is no output layer. The output of RBM is getting the reconstruction back to the input as shown

The point of RBM is the way in which they learn by themselves for data reconstruction

RBM uses all transactions that transfer from acquiring bank as visible input and then that goes to the hidden node, and after the calculation of the activation function, the RBM reconstructs the model by transferring the new input from the activation function back to the output or visible function.

Drawbacks of RBM

The RBMs have some drawbacks related to the partition function in the model’s energy which makes computing the log likelihood under the model intractable, and so we cannot even track the loss we care about. In an autoencoder, in contrast, one can at least track cross entropy which is what is being minimized by the model’s learning algorithm — back-propagation of errors.

Training of RBM

The RBM model is a parametric model of the joint distribution between hidden variables and the observable inputs. To train the model, we need to find values for the parameters=(W,b,c) that minimize the energy.

Since RBM is energy-based model, it can be learnt by performing stochastic gradient descent on the negative log-likelihood of the training data. However, it is intractable to compute the gradient analytically as it involves the expectation over all possible configurations of the input, the gradient can be estimated using a method called contrastive divergence (CD) Hinton (2002). CD replaces the expectation with a sample taken over a limited number of Gibbs sampling steps (Gibbs sampling is Markov chain Monte Carlo (MCMC) algorithm, it is used for obtaining a sequence of observations when direct sampling is difficult).

To speed up the sampling process Contrastive Divergence (CD) starts by initializing the Markov chain with a training example (from a distribution that is expected to be close to desired),

it does not wait for the chain to converge. samples are obtained after only k-steps of Gibbs sampling.

Other Applications

RBM started to gain popularity since its outstanding performance in the Netflix competition and winning the prize. The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films. RBMs outperformed carefully-tuned SVD models and achieved an error rate that is over 6% better than the score of Netflix’s own system.

RBMs have many applications in dimensionality reduction, feature extraction, collaborative filtering, classification and topic modelling. In addition they can be trained in supervised or unsupervised techniques depending on the task.

References

U. Fiore, et al., Network anomaly detection with the restricted Boltzmann machine, Neurocomputing (2013), http://dx.doi.org/10.1016/j.neucom.2012.11.050i
http://deeplearning.net/tutorial/rbm.html#equation-energy2
https://towardsdatascience.com/deep-learning-meets-physics-restricted-boltzmann-machines-part-i-6df5c4918c15
https://www.cs.toronto.edu/~rsalakhu/papers/rbmcf.pdf
https://medium.com/datadriveninvestor/deep-learning-restricted-boltzmann-machine-b76241af7a92
https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine
https://www.youtube.com/watch?v=FsAvo0E5Pmw
Apapan Pumsirirat and Liu Yan, “Credit Card Fraud Detection using Deep Learning based on Auto-Encoder and Restricted Boltzmann Machine” International Journal of Advanced Computer Science and Applications(IJACSA), 9(1), 2018. http://dx.doi.org/10.14569/IJACSA.2018.090103
https://www.datascience.com/blog/fraud-detection-with-tensorflow