Fraudulent Activity Recognition in Graph Networks

Published in

Stanford CS224W GraphML Tutorials

13 min readMay 15, 2023

By Ansh Khurana, Aman Kansal and Soumya Chatterjee

Introduction and Motivation

Graphs present a natural way of modelling a large variety of phenomenon. These include social networks, financial networks, e-commerce activities and so on. GNNs can seamlessly model interactions between various users, and their activities like transactions, reviews, and posts [1].

The world can be modelled as a Graph! (source)

In most networks, malicious users pose a great threat to the stability and experience of other ‘normal’ users. Malicious users can enter a network, produce fraudulent information, participate in fraudulent transactions and facilitate other harmful activities.

There are many malicious users on the internet with an intent to defraud benign users. We aim to apply our learnings from the course and catch these malign actors in graph networks! (source)

In our project, we aim to identify fraudulent users and activities through the immense inter-node modelling capabilities of GNNs. Identification of fraudulent users can result in widespread economical, social and mental-health benefits. We aim to utilize GNNs for fraudulent user detection for the following datasets.

Representation of fraudulent nodes detection in a graph (source).

The figure above demonstrates a social network with some Fraud (malignant) actors and Normal (benign) actors. Our aim is to discover these malignant users in the graph through their node features and local graph structure. We reduce the problem of Fraudulent Activity Recognition into a node-level classification task. More details for the specific datasets have been provided below:

Datasets and Tasks

Fraud detection in DGraph social network in finance

Subgraph of the DGraph Finvolution dataset representing normal nodes in Blue and fraudulent nodes in Red. Edges are directed, based on whether user v has an emergency contact of user u.

DGraph-Fin [2] We found the DGraph social network in finance [3] that represents users of the FinVolution group [4]. FinVolution is a leading fintech platform in China connecting underserved borrowers with financial institutions.

Each node in the graph is a user of FinVolution, and edges in the graph between two users (u and v) represent user u designating user v as their emergency contact. This is a directional graph between the users. The task is to label whether the user is fraudulent or normal. There are 3,700,550 nodes in the graph and 4,300,999 edges. We are given labels for users as 0 for normal, and 1 for fraud. There are 1,210,092 users labeled as Normal and 15509 users labeled as Fraud. The dataset also provides 17 features based on user demographics. The features are used as the initial node embeddings in our GNN framework.

This is a very large-scale graph with many unlabelled nodes. Even in the nodes that are labeled, there are very few nodes that are labeled as ‘fraud’. These create a dataset with a high class-imbalance (positive:negative class ratio is ~ 78). This makes our node-classification problem very realistic and challenging.

As described before, the provided node features for a user u will be used as the initial node embeddings (h_u⁰) of the node. We then abstract this as a general node classification problem, which is described in the Task Modelling section below.

We use the official splits provided by the dataset, with a 70/15/15 train-validation-test split ratio.

Fraudulent User Detection in Amazon Reviews

Examples of fake reviews found on Amazon (source).

The Amazon Review dataset [5] consists of product reviews for Musical Instruments. Users are represented as nodes in the graph. The graph consists of 11,994 nodes.

There are three types of relations in this heterogenous graph, with the following counts —

U-P-U: 351,216 edges representing edges between users that have reviewed at least one same product.
U-S-U: 7,132,958 edges representing edges between users that have at least one same star rating within one week
U-V-U: 2,073,474 edges representing edges between users with top 5% mutual review similarity score (based on TF-IDF) among all users.

There are 821 users labeled as Fraudulent and 7818 users labeled as Normal. The dataset also includes 25 handcrafted features for each user based on [6]. Similar to the DGraph-Fin dataset, we use these provided features as the initial node embeddings h_u⁰.

We use the default train/validation/test split provided with the dataset, with a 70/10/20 train-validation-test split ratio.

Task Modelling and Problem Statement

For both these use cases, we abstract the problem statement as follows —

In general, given a graph G = (V, E), with initial node embeddings provided for each node u as h_u⁰, the goal is to take predict the probability of the node being a fraudulent node. We use Graph Neural Networks (GNNs) to model the relationships between different users and to make our predictions y_u.

Models

We briefly describe the baseline models we use for the node-level classification task. Since these are covered in our lecture material, we emphasize our discussion on the novel contributions and modeling changes we employ beyond these baselines.

Representing GNNs as Message Passing networks.

Consider a graph G = (V, E). Each node has an initial node embedding h_u⁰. In Graph Neural Networks (GNNs), we model a L layer Message Passing network [7] that can aggregate passing messages over L-hop neighborhoods. There are two important design considerations in creating a GNN, the MESSAGE() function that describes the message passed from a node u to its neighbor v. Next - the AGGREGATE() function, which defines how the messages passed from all neighbors of the current nodes will be aggregated to update the central node’s node embedding.

GNN Update Rule:

Message computation

2. Aggregation

We will describe all implemented variants below with this general formulation of message-passing GNNs.

For Node-level classification, we use the final learned embedding and use a classification head on top of it for the final classification probabilities.

Implemented Models

MLP

We implement a simple Multi-Layer Perceptron as a baseline network to showcase the benefits of graph-based modeling.

GCN

Graph Convolutional Networks (GCNs) [8] are a generalization of the traditional Convolutional Neural Networks (CNNs). GCNs model a Message Passing network [7], in which messages are passed from the neighbor nodes to the central node. In a GCN, the message passed are the node embeddings themselves, and the aggregation function is just the mean.

Message aggregation in GCNs. We assume that self-loops are added in the graph, such that the central node embedding is also included in the aggregate.

GraphSAGE

In GraphSAGE [9] networks, we modify the aggregation functions from GCNs to use the central node’s embeddings in a more expressive manner. The central node embeddings are concatenated to the aggregate over the neighbor embeddings as shown below.

Message aggregation in GraphSAGE networks.

Novel Models

In addition to the base methods covered in the course, we explore several advanced GNN modeling techniques such as the Chebyshev Convolutional Network [10] and Pick-and-Choose GNN [11].

Chebyshev Graph Convolutional Networks (ChebNet)

We have looked at the message-passing implementation of Graph Neural Networks. An alternate approach includes applying linear operators in the Fourier basis, obtained using the graph Laplacian. The graph Laplacian is defined as L = A — D where A is the adjacency matrix and D is the degree matrix. We can define polynomials in the Laplacian L as:

Using these polynomials, we can define convolutions as:

This takes as input the embeddings of all nodes and outputs updated embeddings of these nodes forming one GNN layer. Consider P_w(L) as the m degree Chebyshev polynomial. With different values of m, such a GNN layer applied on the graph features in Fourier basis can capture the m-hop neighborhood of nodes. We can see this as below:

We can see that with m = 0, the resulting node embeddings are proportional to the input embeddings. And with m = 1, the two-hop neighborhood of nodes is captured due to the presence of the term A * x. It can be shown that this holds for larger values of m too.

Our implementation for ChebNet. We use Chebyshev Convolutions (ChebConv) instead of traditional Graph Convolutions in ChebNets.

Chebyshev convolution has a similar formulation to GNNs but uses Chebyshev polynomials instead of general polynomials as we used in the above formulation. An interesting property of Chebyshev polynomials is that they can be used to obtain polynomial approximations of all functions which is close to the best possible polynomial approximation [12]. This property makes the optimization more numerically stable.

One issue with the polynomials is that the powers of the Laplacian can blow up since their eigenvalues are not bounded. To avoid this, the Chebyshev polynomials in Chebyshev Convolutions (ChebConvs) are defined in terms of the normalized Laplacian, which is given by:

Here normalizing the maximum eigenvalue guarantees that the maximum eigenvalue of the normalized Laplacian will be bounded in the range [-1, 1].

Finally, using the normalized Laplacian, we can define the following polynomial which is used in Chebyshev graph convolution layers.

The advantage of this is that single ChebConv layer can aggregate information from a multi-hop neighborhood, whereas a GNN layer can only do so from a single-hop neighborhood. As we will see in the results, this allows ChebConv to achieve competitive performance with much fewer parameters than message-passing GNNs. However, computing the Laplacian polynomials is expensive, and scaling to large graphs can be challenging. In fact, we had to use a smaller hidden dimension size of ChebConv to ensure that it fits in GPU memory.

Pick-and-Choose GNNs (PC-GNNs)

As described in the dataset and tasks section of the blog, our graph datasets represent real-world data distribution and, thus, a label bias towards the “Normal” label. This means that we much more data samples of the negative “Normal” class as compared to the positive “Fraud” class. This makes our Node classification harder due to the class imbalance problem.

To counter this, we explored research papers and methods designed to tackle this problem in the context of fraud detection. In this section, we showcase “Pick and Choose: A GNN-based Imbalance Learning Approach for Fraud Detection” by Liu et al [11].

Overview of the PC-GNN framework. (source)

PC-GNN devised a scheme for label-balanced subgraph sampling in the CHOOSE step. In the PICK step, they design a smart neighborhood aggregation that over-samples the minority class neighborhood with nodes of the same class and under-samples the neighborhood of the majority class.

This allows learning better node embeddings from the minority class as we can learn features from other nodes of the same class and their L-1 hop neighborhoods, allowing us to learn more robust features and utilize the training data more effectively.

We implement PC-GNNs from scratch using the simple yet powerful PyG MessagePassing class [13]!

Code snippet for PC-GNN Conv layers.

PC-GNNs learn a distance metric U, that helps in over-sampling the neighborhoods of the minority nodes and under-sampling the majority nodes. We train this distance metric to minimize the distance between nodes in the same class, and increase the distance between nodes that are in different classes.

Heterogenous GNNs

For the Amazon Fraudulent Reviews dataset, we need to model different relationship types as described in the datasets and the tasks section. We need to model three relationship types (r = {U-P-U, U-S-U, U-V-U}).

To perform node classification on this dataset, we implement relation-aware HeteroGNN models [14].

To this end, we extend our GCN and GraphSAGE implementations to model different relationship types, and then aggregate messages over different relationships to compute the node embedding updates.

Code snippet for HeteroGNNs.

The above equation expresses our update to the GCN implementation to create the r-GCN model. Here W represents the weights of the transforms used on the previous layer’s nodes embeddings.

We test the relational versions of our models: r-GCN, r-GraphSAGE and r-ChebNet on the Amazon reviews dataset and report the performance for fraudulent user detection.

Results and Insights

Metrics

Since our fraudulent activity recognition problem is abstracted as node-level classification task, we employ well-known classification metrics to evaluate our various approaches.

Since our classification task is binary (0 — for Normal node, 1 — for Fraudulent node), we carefully pick Area Under the Receiver-Operating Characteristic (AUROC) graph as our main metric since it’s threshold independent.

We plot the ROC curves for the top-performing variants of each model. We compare the models based on their AUROC score. In addition, we employ other common classification metrics such as accuracy and F1-score.

We report performance for all splits — train/validation/test.

DGraph-Fin

As described in the datasets section, our goal for the DGraph-Fin dataset is to detect users (nodes) posing as fraudulent. To this end, we compare the AUC obtained of all five models in the figure below.

The obtained AUC values are summarized in the table below:

DGraph-Fin AUC Results for Fraud user detection.

We observe that our novel ChebNet model achieves close to the best performance metrics with less than 1/7th of the parameters of the top-performing model. GraphSAGE was our top-performing model with the largest number of parameters over all the implemented models.

Receiver Operating Characteristic Curve for all five models.

In the Figure above, we plot the ROC curve with True Positive Rate (y-axis) and the False Positive Rate (x-axis) over the entire range of classification thresholds. We observe that all graph based models perform very well on the DGraphFin Fraudulent User detection task, supporting our hypothesis that graph based learning methods that can reliably learn network effects and malignant activity from fraudulent users.

Amazon Reviews

On the Amazon reviews dataset, we need to identify fraudulent review writers based on relationships between the all the reviews of a product and all the reviews a particular user writes. This helps in creating a graph structure based on the written reviews themselves and activity of all users on Amazon. For this task, our relationship-aware GNN models had to model three different relationship types (r = {U-P-U, U-S-U, U-V-U}) as described in the Datasets and Tasks section.

We compare the obtained AUC for our three relationship-aware GNN model in the plot below -

Comparison of various methods on the Amazon Reviews dataset where the underlying graph is heterogenous with three different relation types.

We also plot the ROC curve for this binary classification task that compares the True Positive Rate and False Positive Rate over the range of all thresholds.

ROC Curves for all three relational GNN models.

We observe that our r-ChebNet model outperforms both r-GCN and r-GraphSAGE. All graph learning based models perform really well and obtain 92% AUC on the task as summarized in the table below —

AUC results for Fraud user detection on the Amazon review dataset.

We also report other common metrics like classification accuracy and F1-score in the tables below —

Accuracy results for Fraud user detection on the Amazon review dataset.

F1-score results for Fraud user detection on the Amazon review dataset.

We observe that r-ChebNet is the best performing model on AUC, while r-GraphSAGE is the best performing model on the accuracy and F1-score metrics.

Conclusion

In this blog post we explore a major socio-economic challenge of fraudulent user detection in online networks. With the powerful network modelling capabilities of GNNs, we were able to achieve impressive performance for this node classification task on two real-world datasets. We test five different approaches, including those learnt over the course as well as two novel methods — ChebNet and PC-GNNs. We observe that all graph based networks (GNN variants) significantly outperform an MLP with similar number of parameters. This demonstrates that effectiveness of graph based learning to reliably model network effects and malignant activities by fraudulent users. In particular, ChebNet was one of the strongest performing models as it is able to model multi-hop dependencies within a single layer, while using much fewer parameters.

You can find our code in this Google Colab notebook here.

References

Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, & Maosong Sun. (2021). Graph Neural Networks: A Review of Methods and Applications.
Xuanwen Huang, Yang Yang, Yang Wang, Chunping Wang, Zhisheng Zhang, Jiarong Xu, & Lei Chen (2022). DGraph: A Large-Scale Financial Dataset for Graph Anomaly Detection. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
DGraph-Fin dataset webpage (link)
Finvolution group webpage (link)
Dou, Y., Liu, Z., Sun, L., Deng, Y., Peng, H., & Yu, P. (2020). Enhancing Graph Neural Network-Based Fraud Detectors against Camouflaged Fraudsters. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (pp. 315–324). Association for Computing Machinery.
Zhang, S., Yin, H., Chen, T., Hung, Q., Huang, Z., & Cui, L. (2020). GCN-Based User Representation Learning for Unifying Robust Recommendation and Fraudster Detection. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 689–698). Association for Computing Machinery.
Thomas N. Kipf, & Max Welling (2017). Semi-Supervised Classification with Graph Convolutional Networks. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings.
Daigavane, et al., “Understanding Convolutions on Graphs”, Distill, 2021.
William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 1025–1035.
Defferrard, M., Bresson, X., & Vandergheynst, P. (2016). Convolutional neural networks on graphs with fast localized spectral filtering. In Advances in neural information processing systems (pp. 3844–3852).
Yang Liu, Xiang Ao, Zidi Qin, Jianfeng Chi, Jinghua Feng, Hao Yang, and Qing He. 2021. Pick and Choose: A GNN-based Imbalanced Learning Approach for Fraud Detection. In Proceedings of the Web Conference 2021 (WWW ‘21). Association for Computing Machinery, New York, NY, USA, 3168–3177.
Rivlin, T. J. (2020). Chebyshev polynomials. Courier Dover Publications.
Fey, M., & Lenssen, J. E. (2019). Fast graph representation learning with PyTorch Geometric. In ICLR Workshop on Representation Learning on Graphs and Manifolds.
Wang, S., Wang, Z., Li, J., Yao, Q., Chen, L., & Tong, H. (2019). Heterogeneous graph neural networks. In The World Wide Web Conference (pp. 2022–2032).