Graph Analytics on Financial Crime Detection for Different Levels of Transaction

Abigail A Antenor
13 min readDec 29, 2023

--

By Abigail Antenor, Sook-Yee Chong
Data Scientists
Artificial Intelligence and Innovation Center of Excellence
Aboitiz Data Innovation and Unionbank of the Philippines

Many financial institutions employ rule-based systems to detect malicious activities within the bank. However, fraud tactics have evolved over the years and enhancing our methods of detection is necessary to combat them constantly.

In our study, strategic use of the right data and tools can significantly enhance our ability to quickly spot patterns within transaction flows and prioritize suspicious accounts for further investigation. Our team delved into the field of network science to examine these intricate relationships between accounts and determine how money is being transferred across a network.

Having a better understanding of the levels of transaction, which will be further discussed in Section III, can better picture how fraud behaves in a network. For instance, fraudsters often exploit intermediary accounts, to conceal their identity and facilitate money laundering across a network. Detecting this influence at an early stage mitigates potential damage and losses. One of the techniques we employed is measuring node centralities. This approach allows us to quantify the number of connections linked to each account, allowing us to identify those serving as brokers/bridges, and evaluate how quickly and efficiently a particular account or entity can reach other accounts for transactions which means that key players can facilitate the smooth flow of funds through the network, discussed in Section IV.

Subsequently, we investigated the correlation between this measure and fraud across different levels of connection. By expanding our analysis to three levels of connection, we gained a more comprehensive understanding of the potential vulnerabilities and their potential impact on the network.

This article comprises the following: (1) the concept of fraud, (2) the status of fraud in the Philippines, (3) the utilization of Python packages and graph databases to investigate patterns of fraud (4) network metrics for fraud detection.

I. Concept of fraud

Filipinos must become more aware of the growing financial crimes in the country. As we aim to prevent them from happening, we also find it crucial to spot these patterns easily, create awareness, and educate the public so that they can feel safer when doing transactions. To do this, we want to first introduce what is considered fraud and its different manifestations.

Article 1338 of the Civil Code of the Philippines defines fraud as the occurrence wherein one contracting party, through manipulative words and actions, persuades the other party to enter into an agreement that they would not have agreed to otherwise. The deception in this context refers to a sly scheme with fraudulent intent, often executed through the concealment or omission of material facts. (Supreme Court E-library, 2013)

Various manifestations of fraud in financial institutions include:

1. Card cloning: This involves the replication of credit, debit, or ATM cards to conduct unauthorized purchases or withdraw funds from someone else’s account.

2. Phishing/Scam: This includes scenarios where someone pretends to be a legitimate institution, sending out text messages and emails with the intent to lure someone into giving out their personal information. Phishing can result in identity theft.

3. Identity theft: The unsanctioned use of another person’s personally identifiable information (PII) to engage in illicit transactions.

3. Unofficial online lenders: Fraudsters pose as lenders from either legitimate financial institutions or private entities, swindling people into making malicious transactions with them.

4. Money mules: Money mules, knowingly/unknowingly, act as accomplices in fraudulent activities by providing their accounts for legitimate transactions that might be laundered in a larger fraud scheme.

5. Loan applications scam: Involving employees to easily secure loan approvals through manipulation. (Bangko Sentral ng Pilipinas, 2023)

II. Status of fraud in the Philippines

Now that we understand what constitutes fraudulent activities, how much harm do they cause to others? What is the prevalence of fraud in the Philippines? And why is it important to address and discuss this issue?

In 2022, 8.7% of digital transactions in the Philippines were suspected to be fraudulent, with a digital fraud rate much higher than the global average, ranking the country as the third highest among all regions in the study conducted by TransUnion. (TransUnion, 2023)

Additionally, as reported by the Anti-Money Laundering Council (AMLC), the surge in the number of suspicious transaction reports (STRs) in 2021 can be attributed to the widespread adoption of digital banking and electronic wallets. This upward trend is in line with the significant increase in electronic fund transfers through PESONet and InstaPay transactions, recording growth rates of 164% and 223%, during the first half of 2021. Moreover, the Financial Intelligence Unit indicates that 89% of suspicious transactions reported in 2021 were related to money mules, with the remaining 11% spanning from 2016 to 2021. In terms of the monetary value, this accounted for 99% of the total value over the entire six-year period (i.e., PHP 505 billion).

Because of these prevalent cases, it was recommended that businesses continue to equip themselves with proper tools and new methods to detect fraud without hindering the consumer journey.

III. Utilization of Python packages and graph databases to detect patterns of fraud

To enhance fraud detection within the bank, our team delved into the field of network science.

While banks typically employ fraud detection systems, these often have limited capabilities, focusing on basic features like demographics or analyzing only direct transactions (Figure 1a), neglecting higher-degree connections (Figures 1b and 1c). This limitation can result in overlooking many fraudulent transactions. Examining higher levels will help identify more complex fraud patterns or reveal collusive fraud behaviour involving multiple groups. For example, exploring third-level connections (Figure 1c) enables the detection of fraud schemes that may involve multiple layers of deception, three accounts away from the source.

To enhance the system, the team implemented network analysis, to identify intricate relationships between transactions and scrutinize suspicious activities up to three levels of connections. Enhancing this analysis includes incorporating attributes such as the account holder’s name, transaction date, amount, and fraud labels for a more comprehensive evaluation.

Figure 1a: First level of connection or direct transaction
Figure 1b: Second level of connection
Figure 1c: Third level of connection

The following are some of the Python packages we used in the study:

NetworkX is a popular library in the Python ecosystem, and it is easy to use and well-integrated with other machine-learning libraries. It is valuable in fraud detection by allowing the modelling of relationships between entities as a graph. Its algorithms, such as centrality measures and community detection, further discussed in Section IV, can uncover patterns indicative of fraudulent activities. For instance, anomalies may be identified through the detection of nodes with unusual influence or tightly connected communities, occurring within a network.

igraph is available in both Python and R. It excels in a larger-scale network that demands high-performance computations. It reveals key network characteristics like centrality measures and betweenness, further discussed in Section IV, to identify critical nodes and potential intermediaries in fraudulent schemes. igraph has community detection algorithms, to reveal closely-knit groups within networks, to aid analysts to uncover collusion, and enhance more precise fraud detection efforts.

Fraud detection employs a variety of techniques to identify and prevent deception. Some of them, aforementioned, include utilizing centrality measures, community detection, anomaly detection, pattern recognition, and combining network analysis with machine learning models for a more sophisticated approach.

There are several reasons why graph centrality measures are employed. A study by Yoo et al., 2023 showed that utilizing centrality measures in medicare fraud detection significantly enhances precision by 4%, recall by 24%, and F1-score by 14% compared to graph neural network models. Prusti et al., 2021 revealed that incorporating centrality features led to an average improvement of up to 6% in the evaluation metrics of machine learning algorithms. As centrality measures offer insights into the relative importance and influence of account holders or entities within a network, their application will reveal key relationships and expose major players in fraudulent activities, that may go unnoticed by other models.

In the following sections, we will describe centrality measures and community detection utilized in fraud detection in more detail.

IV. Network metrics for fraud detection

(i) Centrality measures

Understanding the centrality measures of a network is a valuable approach for flagging suspicious account holders. By identifying central nodes (or account holders), which represent influential accounts, targeted interventions can be implemented. In the section below, we will describe unusual changes in centrality measures that may indicate potentially fraudulent behavior. We will describe how these influential account holders within the network, affect how money flow is being optimized in a coordinated money laundering scheme.

Let us examine the following network, shown in Figure 2, as we explore each centrality measure to identify which nodes — 1, 2, 3- serve as the center node of the network.

Figure 2: A simple undirected (relationships are non-directional) network. We will further identify which is the center among Nodes 1, 2, and 3 are center.

(a.) Degree centrality

Degree centrality highlights the most connected nodes.

  • Measures the number of connections a node has in a network
  • Indicates prominent nodes within a network, i.e., they are the most connected node within a network, emphasizing their significance based on the number of edges they possess.
Figure 3: Degree of Node 1 is 4 since there are 4 nodes connected to it.
Figure 4: Degree of Node 2 is 2 since there are 2 nodes connected to it.
Figure 5: Degree of Node 3 is 3 since there are 3 nodes connected to it.
Table 1: Node 1 is the most connected node based on degree centrality.

As shown in Table 1, Node 1 has the highest degree centrality, and thus, the node with the most number of connections. Node 1 may potentially be the focal point for fraud investigation, as a highly connected account could be the central point for coordinating fraudulent activities, and may exert significant control over money transfers within a network.

(b) Closeness centrality

Closeness centrality identifies how quickly a node can reach all other nodes in a network

  • Reciprocal of the sum of distances
  • Highlights nodes that are close to all others
Figure 6: A hop in this context is the number of edges linking 2 nodes. For example, going from the upper leftmost Node to Node 1 takes 1 hop meanwhile going from the upper leftmost node to Node 2 takes 2 hops, and so on.
Table 2: Node 3 is the central node based on closeness centrality.

At higher levels of connections, nodes with high closeness centrality are well-placed during money flows to a large portion of the network. As shown in Table 2, Node 3 has the highest closeness centrality. Node 3 may be flagged as suspicious as this account holder has the shortest average distance to all other nodes, allowing it to control the flow of transactions and enabling efficient money flow within the network. Due to its central position, this account holder may play a key role in orchestrating fraudulent activities by connecting other fraudulent account holders in the network.

(c) Betweenness centrality

Betweenness centrality identifies nodes that act as critical intermediaries or gatekeepers, brokers in a network

  • How often does this node occur in the paths of the other nodes?
  • The sum of shortest paths through node / all shortest paths

In Figure 7 below, passing through Node 1 from the blue node to the green node is considered a path. Summing over all possible paths going to the green node and passing through Node 1 is the total paths in the first row and first column as seen in Table 3.

Figure 7: A path in a network is a sequence of nodes in which each node is connected by one edge to the next. In the first network, going from the green node to the blue node is 1 path which passes through node 1. In the second network, the path going from the green node to the blue node, which passes through Node 1, is 1 path.
Table 3: Node 3 is the central node based on betweenness centrality.

As shown in Table 3, Node 3 has the highest betweenness centrality. It holds a strategically important position to control or manipulate the flow of transactions between seemingly unrelated groups. In a fraud network, the removal or investigation into an account holder with high betweenness may disrupt transactions within the network and expose other hidden connections that rely on this account holder. Any sudden increase or unexpected fund transfers through these accounts should be flagged as suspicious. Thus, early identification of these accounts is important to understand the overall connectivity and efficiency of the network.

(d) PageRank centrality

PageRank centrality assesses a node’s importance based on the quality and quantity of its connections in a network. PageRank highlights nodes with influence within the network or a node of great importance. It takes into account inbound and outbound links, so it works well with both directed and undirected networks. While it was created by the co-founders of Google to rank important web pages in their search engine (PageRank US Patent US6285999B1), it can also be applied to fraud detection. PageRank can detect account holders or entities that have a disproportionate influence within a network. If multiple nodes with high PageRank scores are connected, it could indicate collusion or coordinate fraudulent activities.

Figure 8: PageRank of each node in Iteration 0.
Figure 9: This shows how to get the PageRank (PR) of Node 1 in Iteration 1. To do that, we must get the PageRank of each node connected to it in Iteration 0 and then divide it by the number of outgoing links (OL) of that node. We then get the sum and that’ll be the new PageRank of Node 1.
Table 4: Node 1 is the central node based on PageRank centrality.

(ii) Community detection

Several methods can be employed to identify groups or communities of nodes in a network that are more densely connected to each other than to the rest of the network. Each of the methodologies listed below was developed with its own strength and can be applied to fraud detection.

In general, there are two broad approaches to understanding clusters or communities, namely the Divisive method (Girvan-Newman algorithm) (Arasteh and Alizadeh, 2019) and the Agglomerative method (Louvain algorithm) (Chaudhary 2019). They differ in how they form and merge communities. The choice of algorithms will depend on the network size, the level of granularity in community detection, and specific characteristics of fraud being targeted. Even using a combination of both algorithms may be effective, depending on the use case.

(a) Louvain algorithm

Louvain algorithm takes the bottom-up approach, where each node is merged iteratively based on how similar they are to other nodes. In other words, it optimizes the identification of groups of nodes that are denser internally than with the rest of the network (Figure 10). It is effective for identifying communities in large-scale networks.

Figure 10: Visualisation of Louvain’s clustering algorithm.

Fraudulent activities often involve collaboration or coordination among multiple individuals. Louvain algorithm reveals distinct communities with unusually high or low transaction volumes, or sudden changes in community structures iteratively over time. These communities may reveal specific behavioral traits or patterns that differ from legitimate ones. In addition, any accounts that do not fit easily into any community or have connections that span multiple communities such as node 0 (in Figure 11) may be flagged as anomalies, that should be investigated for potential fraud.

Figure 11: Connections of node 0 spans several communities

(b) Girvan-Newman algorithm

Girvan-Newman algorithm (Girvan and Newman, 2002) takes the opposite approach to Louvain, in such a way as to recursively divide the network into smaller subgroups (or communities). It is effective for detecting communities with well-defined boundaries.

The process begins by removing edges with the highest betweenness centrality iteratively to disrupt the flow between communities, thus, revealing its structure. For example, the edge between nodes [0,31] has the highest betweenness score and was first removed (Figure 12). At the 10th iteration, the edge between nodes [2,13] was removed.

Figure 12: The edge (in red) with the highest betweenness is iteratively removed.
Figure 13: After 10 iterations, there is a possible 2–3 communities revealed.

As mentioned above, high betweenness represents critical connections or bridges within the network. In Figure 13, Girvan-Newman reveals communities (after edge removal), which may be utilised to identify potential fraud rings (Ayeb et al, 2020) or collusion.

The above algorithms are useful in higher levels of connections, as fraudulent activities often involve collaboration between multiple entities. Connections at higher levels may reveal the existence of fraud rings or organized groups of individuals involved in fraudulent activities. Community detection can help uncover these structures and highlight relationships that might not be immediately apparent if only analyzing direct transactions at the first degree. Expanding analysis beyond first level allows for a more nuanced understanding of the intricate relationships within a network, in our fraud detection efforts.

References:

G.R. №171428 — ALEJANDRO V. TANKEH, PETITIONER, VS. DEVELOPMENT BANK OF THE PHILIPPINES, STERLING SHIPPING LINES, INC., RUPERTO V. TANKEH, VICENTE ARENAS, AND ASSET PRIVATIZATION TRUST, RESPONDENTS.D E C I S I O N — Supreme Court E-Library. elibrary.judiciary.gov.ph/thebookshelf/showdocs/1/56359#:~:text=Under%20Article%201338%20of%20the,would%20not%20have%20agreed%20to.

Bangko Sentral ng Pilipinas [Consumer Protection and Market Conduct Office Strategic Communication and Advocacy]. “PROTECT YOURSELF FROM FRAUD AND SCAM.” Bangko Sentral Ng Pilipinas, www.bsp.gov.ph/Media_and_Research/Primers%20Faqs/Protect_yourself_from_Fraud_and_Scam. Accessed 21 Dec. 2023.

TransUnion. “TransUnion Report Finds Digital Fraud Attempts Fall 18% in the Philippines but Rise 80% Globally From Pre-Pandemic Levels.” TransUnion, 31 Mar. 2023, newsroom.transunion.ph/transunion-report-finds-digital-fraud-attempts-fall-18-in-the-philippines-but-rise-80-globally-from-pre-pandemic-levels. Accessed 27 Dec. 2023.

Agcaoili, Lawrence. “AMLC Warns Money Mules Scams Rising in Philippines.” Philstar.com, 12 Feb. 2023, www.philstar.com/business/2023/02/13/2244465/amlc-warns-money-mules-scams-rising-philippines#:~:text=In%20a%20report%20on%20money,the%20first%20quarter%20of%202022.

PageRank U.S. Patent — Method for node ranking in a linked database — Patent number 6,285,999

Arasteh, M., Alizadeh, S. A fast divisive community detection algorithm based on edge degree betweenness centrality. Appl Intell 49, 689–702 (2019). https://doi.org/10.1007/s10489-018-1297-9

Chaudhary, L., Singh, B. (2019). Community Detection Using an Enhanced Louvain Method in Complex Networks. In: Fahrnberger, G., Gopinathan, S., Parida, L. (eds) Distributed Computing and Internet Technology. ICDCIT 2019. Lecture Notes in Computer Science(), vol 11319. Springer, Cham. https://doi.org/10.1007/978-3-030-05366-6_20

Girvan, M., & Newman, M. E. (2002). Community structure in social and biological networks. Proceedings of the national academy of sciences, 99(12), 7821–7826.

Safa El Ayeb, Baptiste Hemery, Fabrice Jeanne, Estelle Pawlowski Cherrier. Community Detection for Mobile Money Fraud Detection. 2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS), Dec 2020, Paris, France. https://doi.org/10.1109/SNAMS52053.2020.9336578

Prusti, D., Das, D. & Rath, S.K. Credit Card Fraud Detection Technique by Applying Graph Database Model. Arab J Sci Eng 46, 1–20 (2021). https://doi.org/10.1007/s13369-021-05682-9

Y. Yoo, J. Shin and S. Kyeong, “Medicare Fraud Detection Using Graph Analysis: A Comparative Study of Machine Learning and Graph Neural Networks,” in IEEE Access, vol. 11, pp. 88278–88294, 2023, doi: 10.1109/ACCESS.2023.3305962.

--

--