Discovering Fraud Patterns using IBM Financial Crime Insights

Madhu Vasudevan
IBM Data Science in Practice
6 min readFeb 16, 2021

This blog is written in collaboration with Sanjana Mallikarjuna, IBM, Vyoma Gajjar, IBM and Shweta Shandilya, IBM. Special thanks to Srinivasan Muthuswamy, IBM and Rachit Arora, IBM for their patient guidance and mentoring.

According to Cyber Security Ventures, cybercrime is estimated to cost more than $6 trillion USD globally in 2021 and is expected to grow 15% annually, reaching a whopping $10.5 trillion USD by 2025. In the U.S. alone, insurance fraud amounts to $115 billion USD per year. Over the past five years, the International Consortium of Investigative Journalists has published several investigations including the Panama Papers, the Paradise Papers, the Mauritius Leaks and most recently the FinCEN files. These investigations reveal that financial fraud is often a result of careful execution of several transactions in collusion with other parties using identity-theft, alias accounts, and offshore companies.

Have you ever wondered how to make sense of this data and subsequent revelations thus made on money laundering, movement of money, and corruption? Gaining meaningful insights from such massive, dense, and interconnected data is not easy. It requires detecting indirect relationships, patterns, and connections, a task relational databases that rely on primary and foreign keys to link tables are ill equipped with. Graph traversals on the other hand are highly efficient which makes transaction monitoring and analysis fast, easy and intuitive.

The IBM Financial Crime Insights product offering does exactly this. The offering stores financial records such as accounts, entities, and organizations as nodes and transactions, and relationships as edges resulting in mapping of financial activity into an intuitive graph-based data structure. By traversing this financial network, it builds a behavioral profile of customers and drastic deviations from these profiles can indicate suspicious activity. If you are interested in graph data structure and algorithms, more information can be found at MIT’s portal here. Additional information on how to apply graph to detect money laundering can be found in this paper published by the MIT-IBM Watson AI Lab.

In September 2020, ICIJ along with BuzzFeed News and other media partners, exposed large scale movement of amounts of money for drug cartels, corrupt regimes, arms traffickers, and other international criminals by the biggest banks in the world. The documents, also called FinCEN files, include more than 2,100 Suspicious Activity Reports (SARs), filed by global banks to the Financial Crimes Enforcement Network (FinCEN). In this blog, we will share our experience about loading the FinCEN files into our offering and the key insights found from the data.

About IBM FCI Graph Analytics

FCI Graph Analytics is hosted on IBM Cloud Pak For Data powered by RedHat OpenShift and Watson Data Science Tools. The offering provides two types of analytics. The first type supports exploratory analysis using the Cypher Query Language, which acts as a Bring Your Own Query (BYOQ) platform and allows for expressive and efficient querying of the graph store. This allows investigators to keep up with new fraud patterns in an intuitive and efficient manner. The second type is Link Analysis using out of the box algorithms optimized for graph data-structure such as Risk by Association, Temporal Cycle and Mule Detection which are fed into an ensemble model that helps detect anomalous behavior.

System architecture showing a JanusGraph, Hbase graph store, shipped with pre-built data ingestion, pattern detection, graph algorithms, visual fragments and REST API based service layer.

Schema of the data

In this section, we describe how we mapped the data from FinCEN files to a graph data structure. The structure consists of the following: each Suspicious Activity Report aka ‘filing’ is filed by a ‘filer’. Each filing contains information on the “originator”, “beneficiary”, and “entity” of the transaction. These actors are banks located all over the world.

A graph schema depicting the FinCEN files. Filer, Filing, Entity, Originator, Beneficiary and Country are stored as vertexes. Vertexes Filer and Filing are connected by edge Filed, vertexes Filing and Entity by edge Concerns and vertexes Originator and Beneficiary by edge Transferred. Each Filer, Entity, Originator and Beneficiary are connected to Country by edge Located.

Query Data Analysis

Once the data was loaded on to the system, we used the Cypher Query framework to write queries and detect fraud patterns. This framework can be accessed by opening the “Explore” tab after the user logs into the system.

Login and page of FCI Graph Analytics and the Explore tab that has an input box to execute a Cypher Query.

The “Explore” tab presents a text box for the user to write their Cypher Queries which can be executed by clicking the “Run query” button. The results of the query are presented in the form of a network graph or in a tabular fashion. You can find more information on how to write a Cypher Query here.

Let us now look at some specific examples from the data. The following is an example of a Suspicious Activity Report (SAR), where nearly $45 million USD were transferred from JP Morgan Chase, London to Alfa Bank, Russia:

A graph showing a SAR filed by JP Morgan Chase showing $45 million USD transferred from JP Morgan Chase, London to Alfa Bank, Russia.

Cypher Query for top 5 banks that filed the most SARs

Network representation of top 5 banks that generated a lot of SARs. This list includes the Standard Chartered Plc, China Investment Corporation, Barclays Plc, Deutsche Bank AG and JP Morgan Chase.

Cypher Query for banks where most transactions originated:

This query lists 5 banks that have more than 1,000 outgoing transactions.

Tabular representation of 5 banks where more than 1000 transactions originated. All 5 banks are located in Russia.

Cypher query for countries where more than $40 million USD was received:

Tabular representation of 5 countries where more than $40 million USD was received. The list includes Netherlands, two banks in India, Czech Republic and Sweden.

Cypher query to detect supernode:

In the context of graphs, a supernode can be described as a vertex with disproportionately large number of incoming and outgoing edges. We use this query to list 5 countries that were flagged most often in SARs.

Tabular representation of supernode cypher query list countries that were flagged most often in SARs which includes China, Singapore, Switzerland and United Kingdom.

Graph Algorithm Analysis

Now, let us explore the insights revealed by graph algorithms Page Rank and Degree about the beneficiary and originator banks:

The top beneficiary banks in FinCEN files:

Credit Suisse AG — Switzerland
Rosbank — Russia
JSC Norvik Banka — Latvia
Credit Europe Bank NV — Netherlands
Habib Bank A G Zurich — United Arab Emirates

The top originator banks emerged from FinCEN files:

LTB Bank — Latvia
AS Expobank — Latvia
JSC Norvik Banka — Latvia
JSC VTB Bank — Russia
Deutsche Bank AG — Singapore

Top beneficiary countries revealed in FinCEN files:

United Kingdom
United Arab Emirates
Malaysia
Mauritius
Denmark
Macao
Vietnam
Paraguay
Hungary
Monaco

Using a combination of Page Rank and Degree algorithms, we identified the most prominent countries in the network and have tagged them as Originator (O), Intermediary (I) and Tax havens (TH), as listed below:

Latvia (TH)
Russia (I)
United States (I)
Switzerland (T)
United Arab Emirates (I)
United Kingdom (I)
Hong Kong (I)
Singapore (I)
Cyprus (I)
China (O)
Netherlands (I)

Conclusion

Analyzing highly interconnected data such as financial records using conventional methodologies is both counter intuitive and computationally expensive. Leveraging cognitive computing approach using Graph Analytics helps us detect fraud patterns and non-obvious relationships as well as help us see the bigger picture by visualizing the flow of money through originator, intermediary and tax havens. For more information on how our offering can help your organization protect your customers and to share your feedback, please visit the IBM Financial Crimes Insight product website.

--

--

IBM Data Science in Practice
IBM Data Science in Practice

Published in IBM Data Science in Practice

IBM Data Science in Practice is written by data scientists for data scientists to gain hands-on and in-depth learning and to read about inspirational applications and conceptual understanding for challenging topics in the field. Discuss and network: community.ibm.com/datascience

Madhu Vasudevan
Madhu Vasudevan

Written by Madhu Vasudevan

Engineer | Avid Reader | Cat person