Unmasking Money Laundering: Graph Neural Networks & the Elliptic2 Dataset

Ervin Zubic
4 min readMay 2, 2024


New research reveals how machine learning and a massive dataset expose hidden money laundering patterns in cryptocurrency. Learn how ‘subgraph representation’ could revolutionize anti-money laundering (AML) efforts.

Black and white pencil sketch depicting an intricate network of nodes and edges symbolizing blockchain transactions for anti-money laundering, with icons representing suspicious and licit activities.
Network Intricacies. Image created using DALL-E.

This article is also published on Mirror.xyz.

In the evolving landscape of financial technologies, applying advanced computational methods to enhance security measures has become a critical focus. “The Shape of Money Laundering: Subgraph Representation Learning on the Blockchain with the Elliptic2 Dataset,” a research article co-authored by Claudio Bellei and others from both Elliptic and MIT, sheds light on a novel approach to anti-money laundering (AML) using Graph Neural Networks (GNNs). Published in May of 2024, this paper introduces the Elliptic2 dataset, designed specifically to enhance subgraph representation learning to better detect and analyze money laundering activities in cryptocurrency networks.

Summary of the Research Article

The research addresses the urgent need for effective tools in the detection of money laundering within the complex systems of cryptocurrency transactions. The authors argue that while cryptocurrencies offer pseudonymity to users, their public transactional records provide a unique opportunity for AML solutions. By leveraging a newly introduced dataset called Elliptic2, which consists of over 122K labeled subgraphs within a larger network of 49 million nodes and 196 million transactions, the researchers propose a methodology for identifying potentially illicit activities through subgraph patterns.

The methodology uses scalable Graph Neural Networks (GNNs) to analyze the relational information embedded in these subgraphs. This process permits a more in-depth understanding of the ‘shapes’ or patterns that characterize money laundering activities in cryptocurrency. The paper outlines the dataset’s structure, the features of the nodes and edges, and the process used to label these subgraphs as licit or illicit, providing a substantial foundation for theoretical exploration and practical application in AML processes.

Figure 1. This diagram illustrates the structure of a dataset used in blockchain analysis, where each node represents a cluster of Bitcoin addresses, and the connections (edges) denote transactions, with specific pathways marked as suspicious or licit subgraphs. Source: The Shape of Money Laundering: Subgraph Representation Learning on the Blockchain with the Elliptic2 Dataset, pg. 2.

Figure 1. This diagram illustrates the structure of a dataset used in blockchain analysis, where each node represents a cluster of Bitcoin addresses, and the connections (edges) denote transactions, with specific pathways marked as suspicious or licit subgraphs. Source: The Shape of Money Laundering: Subgraph Representation Learning on the Blockchain with the Elliptic2 Dataset, pg. 2.

Critical Analysis

The strength of this research lies in its pioneering use of subgraph representation learning, which offers a more nuanced analysis of transactional data compared to traditional node-level studies. This approach enhances the accuracy of identifying illicit activities and contributes to the broader field of graph-based learning by offering a real-world application example that challenges existing methodologies.

However, potential limitations include the dependency on the availability and accuracy of labeled data, which is crucial for training the models effectively. While the dataset is extensive, the real-world applicability might face challenges due to the dynamic nature of money laundering tactics, which continually evolve as regulators and criminals adapt to new technologies.

Highlight: The Most Surprising Aspect

Perhaps the most intriguing aspect of this study is the introduction of the Elliptic2 dataset itself. Unlike any previously available datasets, it offers an unprecedented scale of real-world transaction data specifically labeled for AML research. This dataset provides great value for researchers and stands as a testament to the potential of machine learning in transforming financial security measures.

Figure 2.This table summarizes the attributes of the Elliptic2 dataset, detailing the number of nodes, edges, subgraphs, node features, and edge features, and provides statistics for subgraphs classified as licit and suspicious. Source: The Shape of Money Laundering: Subgraph Representation Learning on the Blockchain with the Elliptic2 Dataset, pg. 3.

Figure 2.This table summarizes the attributes of the Elliptic2 dataset, detailing the number of nodes, edges, subgraphs, node features, and edge features, and provides statistics for subgraphs classified as licit and suspicious. Source: The Shape of Money Laundering: Subgraph Representation Learning on the Blockchain with the Elliptic2 Dataset, pg. 3.

Implications and Potential

The implications of this research are significant, extending beyond academia into practical applications within financial institutions and law enforcement agencies. By improving the detection of suspicious activities in real time, the methodologies developed from this dataset could significantly enhance the effectiveness of AML procedures worldwide. The open sharing of this dataset encourages further innovation and collaboration in the field, potentially setting new standards for AML practices in the digital age.


“The Shape of Money Laundering” is a landmark study that significantly advances the application of graph neural networks in the fight against financial crime. Its detailed analysis of subgraph patterns in cryptocurrency transactions offers new insights and tools for regulators and financial analysts. By pushing the boundaries of data science in financial security, this research not only addresses immediate challenges in cryptocurrency regulation but also opens up new avenues for future technological advancements in AML strategies.

Explore Next

Wanna learn about DeFi and make informed decisions? This article analyzes the landscape, offering expert guidance on protocols and risks. Read on…

Discover how blockchain is transforming industries on the Blockchain Insights Hub. Follow me on Twitter for real-time updates on the intersection of blockchain and cybersecurity. Subscribe now to get my exclusive report on the top blockchain security threats of 2024. Dive deeper into my blockchain insights on Mirror.xyz.



Ervin Zubic

Writing about cyber threat intelligence, OSINT, financial crime, and blockchain forensics. Follow me on Twitter for the latest insights.