Follow the Money Part I

Weidong Yang
Kineviz
5 min readAug 31, 2022

--

Analyzing the flow of money from financial transactions can reveal fraudulent activities, as well as the inner workings of collusion among organized crime groups. Financial transactions often produce large, noisy data, requiring laborious effort by analysts to cut through the noise and “connect the dots.” The efficiency of this process can be greatly improved by the adoption of graph data technology. Graph algorithms can help in clustering, path finding, and identifying important patterns from the data.

Bad actors take great effort to hide their tracks, mingling a few illegal transactions with many legitimate ones, to muddy the waters. As a result, algorithms aren’t adequate to reveal the full picture of any given account’s activity. Multi-perspective, dynamic visualization techniques provide context that’s crucial for finding the right questions to ask of intentionally obfuscated data.

In this blog, I will demonstrate effective visualization techniques for dealing with such complex analysis. The data set simulates global money transfers based on real world examples. I’ll cover techniques that can identify patterns of movement, the types of connection(s), and those accounts where money eventually has aggregated. We can observe specific stories that can’t be captured by algorithms. And we can compare multiple perspectives to pull clues from context. We will use GraphXR for demonstration in this blog but the strategies covered can be applied in other visualization tools as well. A skillful analyst may even use Python, JS or R for such a task.

Effective exploratory analysis depends on a quick and painless path from data to analysis (a subject for a future blog). Here, our raw data is a transaction log in .csv format. Each row corresponds to a transaction and the columns contain the information: {sender, receiver, date, amount}. When we drag-and-drop the .csv to GraphXR, each transaction is loaded as a node in the graph space. Just by mousing over a node, we can then examine the data embedded in it.

The Labeled Property Graph (LPG) Data Model, popularized by Neo4j, is well suited for this kind of analysis. While GraphXR is LPG compatible, we discourage the use of multiple labels for a single node. This streamlines and improves interoperability with the Relational Data Model, making it easy to use and robust to manage.

We’ll use the Cypher query language to describe the schema. However, we’ll use GraphXR’s Transforms to modify the schema. These no-code operators (Map, Extract, Aggregate, Shortcut, and Link) echo the Map, Reduce and Group workflow developed for managed big data analysis and provide a fast and intuitive interface for working with LPG modeled graph data.

First, we use Extract to create a new Person category from the columns {sender and receiver}. Identical persons are automatically merged, evolving the graph to a new schema:

(:Person)<-[:FROM_PERSON]-(:Transaction)-[:TO_PERSON]->(:Person)

Next, we can look at the total transactional amounts sent and received by each person. We do this by using Aggregate to add up the {amount} properties from all adjacent transaction nodes and create the properties totalAmountSent and totalAmountReceived for each person. Here, the size of the node represents totalAmountReceived.

Comparing this picture with the next one showing the totalAmountSent as node size, one Person stands out as receiving the most while sending very little. Can you spot the node? The Person sits at the receiving end of many transactions, as the majority of edges connecting to it are of relationship :TO_PERSON.

There are a few Persons that are high on both send and receive, acting like bridges in the middle of paths. To see it more clearly, let’s create a scatter plot in GraphXR’s parametric view with the totalAmountSent on X and the totalAmountReceived on Y:

This gives a sense of how balanced each person is in the money transfer network, potentially hinting at their role. Particularly, dots lying on the Y only receive money, while dots in the upper right both send and receive.

We can also look for patterns by using linking the Transaction node size to the amount property:

We can make these patterns even clearer by simplifying the graph. Let’s use the Shortcut transform to simplify the schema from:

(:Person)<-[:FROM_PERSON]-(:Transaction)-[:TO_PERSON]->(:Person)

To:

(:Person)-[:SEND_TO]->(Person)

We’ll sum the transactions to a totalAmount on the edges and use this to set the edge width. At the same time we’ll count the number of transactions between each pair of Person nodes.

This is much easier to read than before. The clarity can be further improved by putting the most central person — who has received the most transactions — in the center of a geometric ring layout:

The flow of money becomes quite clear. Next we’ll apply a filter to hide edges with totalAmounts that fall below a threshold. This reveals the “backbone” of the network:

Another perspective can be gained by applying a pathfinding algorithm to identify the shortest connection between two Person nodes:

We can use GraphXR’s Trace Neighbor to highlight the reach of a particular Person in the network. This graph shows connections 3 hops out from the node in the bottom left.

Each of these views provides further insight into who-sent-how-much-to-whom. In Follow the Money Part II, we’ll add “where” and “when” to see what additional layers of the story are revealed by visualizing and animating the geospatial and temporal dimensions of our data.

Thank you to Sony Green and Alex Law for contributing to this blog.

--

--

Weidong Yang
Kineviz
Editor for

Weidong is an entrepreneur, scientist, programer and artist. He founded Kineviz and Kinetech Arts.