Fraud detection in retail with graph analysis
Fraud detection is all about connecting the dots. We are going to see how to use graph analysis to identify stolen credit cards and fake identities. For the purpose of this article we have worked with Ralf Becher, irregular.bi. Ralf is Qlik Luminary and he provides solutions to integrate the Graph approach into Business Intelligence solutions like QlikView and Qlik Sense.
Third party fraud occurs when a criminal uses someone else’s identity to commit fraud. For a typical retail operation this takes the form of individuals or groups of individuals using stolen credit card to purchase high-value items.
Fighting it is a challenge. In particular, it means having a capability to detect potential fraud cases in large datasets and a capability to distinguish between real cases and false positives (the cases that look suspicious but are legitimate).
Traditional fraud detection systems focus on threshold related to customers activities. Suspicious activities include for example multiple purchases of the same product, high number of transactions per person or per credit card.
Graph analysis can add an extra layer of security by focusing on the relationships between fraudsters or fraud cases. It helps identify fraud cases that would otherwise go undetected…until too late. We recently explained how to use graph analysis to identify stolen credit cards.
For the this article, we have prepared a dummy dataset typical of an online retail operation. It includes:
To analyse the connections in our data, we stored it in a Neo4j, the leading graph database. The graph approach lies in modelling data as nodes and edges. Here is a schema of our data represented as a graph:
Now that the data is stored in Neo4j, we can analyse it.
First of all we need to set a benchmark for what’s normal. Here is an example of a transaction:
Now that we have an idea of what not to look we can start thinking about patterns specifically associated with fraud. One such pattern is a personal piece of information (IP, email, credit card, address) associated with multiple persons.
Neo4j includes a graph query language called Cypher that allows us to detect such a pattern.
Posted on 7wData.be.