Data Science in the Real World

Credit Card Fraud Detection using Self Organizing FeatureMaps

7 min readSep 17, 2018

What are self organising feature maps ?

A self-organizing map ( SOM) is a type of artificial neural network (ANN) that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map, and is therefore a method to do dimensionality reduction. Self-organizing maps differ from other artificial neural networks as they apply competitive learning as opposed to error-correction learning (such as backpropagation with gradient descent), and in the sense that they use a neighbourhood function to preserve the topological properties of the input space.

SOM was introduced by Finnish professor Teuvo Kohonen in the 1980s is sometimes called a Kohonen map.

What really happens in SOM ?

Each data point in the data set recognizes themselves by competeting for representation. SOM mapping steps starts from initializing the weight vectors. From there a sample vector is selected randomly and the map of weight vectors is searched to find which weight best represents that sample. Each weight vector has neighbouring weights that are close to it. The weight that is chosen is rewarded by being able to become more like that randomly selected sample vector. The neighbours of that weight are also rewarded by being able to become more like the chosen sample vector. This allows the map to grow and form different shapes. Most generally, they form square/rectangular/hexagonal/L shapes in 2D feature space.

Referece: Applications of the growing self-organizing map, Th. Villmann, H.-U. Bauer, May 1998

The Algorithm:

Each node’s weights are initialized.
A vector is chosen at random from the set of training data.
Every node is examined to calculate which one’s weights are most like the input vector. The winning node is commonly known as the Best Matching Unit (BMU).
Then the neighbourhood of the BMU is calculated. The amount of neighbours decreases over time.
The winning weight is rewarded with becoming more like the sample vector. The neighbours also become more like the sample vector. The closer a node is to the BMU, the more its weights get altered and the farther away the neighbour is from the BMU, the less it learns.
Repeat step 2 for N iterations.

Best Matching Unit is a technique which calculates the distance from each weight to the sample vector, by running through all weight vectors. The weight with the shortest distance is the winner. There are numerous ways to determine the distance, however, the most commonly used method is the Euclidean Distance, and that’s what is used in the following implementation.

Cons of Kohonen Maps:

It does not build a generative model for the data, i.e, the model does not understand how data is created.
It does not behave so gently when using categorical data, even worse for mixed types data.
The time for preparing model is slow, hard to train against slowly evolving data

Credit Card Fraud Detection using Self Organising Feature Maps

Gathering Data

Data for this project can be found at UCI repository.

Data Set Information:

File concerns credit card applications. All attribute names and values have been changed to meaningless symbols to protect confidentiality of the data.

This dataset is interesting because there is a good mix of attributes — continuous, nominal with small numbers of values, and nominal with larger numbers of values. There are also a few missing values.

You can find the dataset here.

Code for project

Please find my code here

The above project at Github repository contains two files mainly :

minisom.py : Contains definition of Self Organising Map.
SOM.ipynb : Contains code for detection of credit card frauds.

Lets dive deep into the code given in SOM.ipynb file.

1. Reading Dataset

2. Exploring Dataset : Dataset contains 14 features A1 to A14 and Class label . We will use these features to predict credit card frauds. Class 1 represents whose applications got approved and classs 0 represents the frauds.

3. Min Max Scaling : In this approach, the data is scaled to a fixed range — usually 0 to 1. We will end up with smaller standard deviations, which can suppress the effect of outliers.

4. Self organising maps employs unsupervised learning . We will feed patterns/ records to the network and it will match those records which have similar properties to the nodes in the SOM grid which will lie as clusters . The records which possess unusual or different features/ properties will be matched to nodes in the grid which will lie as outliers

Object of minisom class is created and initialized with random weights using som.random_weights_init( ) function.

x and y are the dimensions of the SOM (it should not be small so that outliers can be detected). x*y gives the total number of maximum classes/nodes present in the SOM grid , into which the input patterns/ records from the dataset can be classified.

input_len are the number of features of X(customer id included to find the id of faulters).
sigma is the radius and learning_rate is the rate at which SOM learns .

som.train_random( ) function trains the SOM for the provided input patterns X and this is done for 100 iterations.Randomly, a record is selected from the given patterns and it is then feeded to the network.The winner node ( the node whose euclidean distance to the given input pattern is minimum) is the predicted class for given input pattern/record.Then, change in weights ( or features ) of winner node in SOM is done so that the euclidean distance between the winner node and the input pattern decreases. Also, weights of neighbours of the winner nodes are also updated in the same sense to decrease the euclidean distance such that the weights of neighbouring nodes which are more closer to winner node are updated more than those who lies far away from the winner node.

After the training process, all the input patterns/ records of the dataset will be matched to of the nodes in the SOM grid.The nodes in the grid with similar properties will lie closer and those with the different properties will be lie at a longer distance to other nodes ( these nodes will be outliers in the SOM grid).

The features given in the dataset v1 to v14 includes parameters like place of transaction, time taken for transaction, how different is amout of transaction from the usual amout of transaction of the user, etc…

Transactions ( records/patterns) which are not fraud will match to the nodes in the SOM grid which are closer to each other as they possess similar properties/ features and transactions which will we fraud will match to the nodes which lie as outliers in the SOM grid as they possess unusual properties/features such as time taken for transaction is different, amount of transaction is too much bigger than usual amount of transaction of the user, etc.

So, we can determine the frauds by finding out the patterns/records/ transactions which were matched to the outlier nodes in the SOM grid.

Not necessarily the records predicted as frauds are to be frauds really. They may have been predicted as frauds due to unusual behaviour of the user. If this is the case, we may apply additional security measures to verify the user.

5. We then calculate the distance between the neurons/ nodes of the SOM grid.

Those node between which the inter-neuron distance is minimun will lie in a cluster and those for which the inter-neuron distance is large , will lie as outliers.

The patterns/ records matched to these outliers are most probably will we frauds.

White boxes in the above distance map corresponds to the nodes for which mean inter-neuron distance is large. Black represents minimum mean inter-neuron distance and white represents maximum mean inter-neuron distance.

6. Find out the patterns/ records which were matched to the nodes with maximum inter-neuron distance. In the SOM.ipynb notebook, I have selected all the nodes for which mean inter-neuron distance is greater than or equal to 0.5 and patterns matched to those nodes are considered as frauds.

7. Our model achieved a accuracy of 81.72 % for fraud detection and the affected population ( which were considered fraud but were not fraud ) was found to be 22.60 %. The results may very if you run the same jupyter notebook because intialization of the weights of the nodes of SOM grid is done by randomly selecting the records/ patterns from the input space i.e randomly selecting the records from the given dataset. Since , we have done training for 100 iterations and weights are randomly intialized every time, convergence may vary . We may try with different iterations like 100, 150, 200 etc. to have better convergence.You may also store the weights of the SOM for which you achieve better accuracy.

Other import thing to consider is that we have done training for very small dataset. So, for better results we may try with bigger dataset which can splitted into training data and test data.

Refer to this similar dataset on kaggle which is bigger in size :