Investigating credit card fraud through Paysim, Neo4j, & GraphXR

Published in

Kineviz

5 min readJan 12, 2022

According to the December 2021 Nilson Report, credit card fraud costs financial services companies tens of billions of dollars today, and over the next decade, will lead to ~$408.50 billion in losses worldwide. This means that we need more efficient fraud detection systems to rise to the challenge. One such solution is an automated fraud detection system that leverages data visualization and analytics in a graph data ecosystem. Such a solution can help organizations build a cybercrime defense that reduces the time, money, and resources needed to carry out fraud investigations at scale.

Verifying fraud across thousands of accounts and their associated transactions is a difficult problem–and one that presents ever-changing targets. That’s because fraudsters are always changing tactics in order to evade detection. Because of this, structuring an effective fraud detection workflow is not a simple task, unless you use the right tools.

Starting with Paysim

To begin, let’s explore what fraud may look like by investigating the client transactions in an open-source dataset built using the PaySim mobile money simulator, which provides a stream of simulated transactions, some of which are flagged as fraudulent. We can utilize Neo4j and GraphXR to model graph data and visualize patterns illustrating some of the types of fraud that can be revealed quickly in a graph environment.

The Paysim dataset is synthetic financial data stemming from a real mobile money network operator in which a mobile phone is used as an electronic wallet for making transactions. There are three key actors in the dataset: clients, merchants, & banks. Clients can change their status across time to become fraudsters and/or mules, depending on their actions. Mules are clients moving money in and out of the network while fraudsters are clients on the receiving end, manipulating other clients for their own personal gain. In this example, we’ll follow clients to see whether they also appear as fraudsters or mules in the dataset, which is considered a multiagent simulation.

Identifying first-party fraud

We introduced first-party fraud in a previous blog. First-party fraud is when someone uses personal identifiable information to obtain a high-value loan or loans with no intention of repaying said loans (aka credit muling). In the Neo4j sandbox, we identify first party fraud by connecting clients who share personal identifiable information (PII) such as a phone number, email address, and/or social security number. In this schema, we show an instance of first party fraud in red and expand that across the whole dataset, to see first party fraud rings, as indicated in the below graphic.

Types of Transactions

In this particular PaySim mobile money dataset, transactions are carried out only by clients. Of course, in more complex networks, banks that issue credit cards and merchants accepting such transactions would add other types of transactions such as refunds and chargebacks. Nonetheless, this transactional data gives us insight into the types of fraudulent transactions that can occur. Charted with an interactive dashboard as shown in GraphXR below, we can quantify the weight of these transactions side-by-side with the graph schema.

*These transaction definitions are provided by* *Dave Voutila, a Neo4j engineer, via this thorough* *blog*.

Now that we’ve identified first party fraud, the next step would be to pull the transaction data into our analysis. The following Cypher query does just that:

MATCH p=(:Client:FirstPartyFraudster)-[]-(:Transaction)-[]-(c:Client) \
WHERE NOT c:FirstPartyFraudster \
RETURN p;

In GraphXR, the following view emerges. Here, we’ve scaled node size by transaction amount — the bigger the node, the larger the transaction. It is interesting to note that some clients are linked to 2 or more first party fraudsters, and are perhaps worth further investigation.

Node Similarity on 1st Party Fraudsters

We can then use the Node Similarity graph algorithm in Neo4j to track whether two clients are similar to each other based on the number of their shared neighbors. Computed by using the Jaccard index, Node Similarity creates a pairwise similarity score property, the firstPartyFraudScore. This score can visually indicate whether other undetected clients are worthy of further investigation to see if they are an extension to the fraudulent network or are a common victim of fraud.

This method can be used to discern the likelihood of second party fraudsters. For example, when linked to the transactions, we can observe how clients are given a secondPartyFraudScore based on their association with first party fraudsters.

While many academics and hackers have expanded upon this dataset using graph analytics and ML, it is important to note that the data itself has been criticized for being overly simplistic. Depicting fraud scenarios as realistically as possible is no easy feat, but being able to get started with a freely available tool such as this, is a highly effective point of entry into exploration of this complex problem.

This analysis was inspired by Fraud Detection Using Neo4j Platform and PaySim Dataset. Learn more about graph-based fraud detection and be a part of the research. Contact the Kineviz team today!

Resources

“Card Fraud Losses Reach $28.65 Billion.” Nilson Report — Card Fraud Losses Reach $28.65 Billion, https://nilsonreport.com/mention/1313/1link/.

Mullen, Caitlin. “Card Industry Faces $400B in Fraud Losses over next Decade, Nilson Says.” Payments Dive, 14 Dec. 2021, https://www.paymentsdive.com/news/card-industry-faces-400b-in-fraud-losses-over-next-decade-nilson-says/611521/.

Voutila, Dave. “Simulating Mobile Money Fraud 🤑 (Paysim PT.1).” Dave Voutila, 23 Mar. 2020, https://www.sisu.io/posts/paysim/.