Sitemap
Stanford CS224W: Machine Learning with Graphs

Tutorials of machine learning on graphs using PyG, written by Stanford students in CS224W.

Illegal Bitcoin Transactions? Not on Graph ML’s Watch!

12 min readJan 18, 2022

--

By Alex Donovan, Grant Hugh, and Kazuki Mogi as part of the Stanford CS224W course project.

Introduction

Many types of data can be interpreted as graphs. Examples include computer networks, molecules, food webs, disease paths, social networks, and public transportation systems. Despite the abundance of graph-based data, traditional methods in machine learning often are not suited for application on graph-based data. Luckily, there is a rapidly growing field, graph machine learning, with many dedicated researchers who have devised wonderful machine learning methods specifically for graphs.

Examples of graphs in the real world. Photo credits from top-left to bottom-right: 1, 2, 3, 4, 5, 6

In this blog post, we explore one of those wonderful methods, GCN-LPA [1], a new graph machine learning technique proposed by Wang and Leskovec in 2020.

We hope that the reader gains a better understanding of GCN-LPA as well as graph ML techniques in general, and learns how to apply them to real world!

Table of Contents

Here’s a quick overview of what’s in this post:

Graph ML Models: LPA, GCN, and GCN-LPA

In this section, we will first cover the mechanism behind LPA, then GCN, and finally GCN-LPA for the problem of node classification.

Notation Introduction

We begin by introducing relevant notation. Consider a graph G = (V, A, X, Y).

V = {v_₁, …, v_n} denotes the set of nodes in the graph.

A is the n by n adjacency matrix of the graph

X is the n by p feature matrix of nodes where the ith row represents the feature vector of node vᵢ.

Y = {y_₁, …, y_n} stores the labels of the nodes in the graph. Unlabelled node vᵢ has corresponding label y = 0.

Additionally, define Dᐨ¹ to be a diagonal matrix where the Dᵢᵢᐨ¹ is the reciprocal of the degree of node vᵢ.

The goal of node classification is to learn a mapping from V to L (our set of labels), and predict the labels of unlabeled nodes.

LPA

The Label Propagation Algorithm (LPA) [2] is a graph ML technique that can be used to classify nodes in a graph.

LPA makes the assumption that two nodes connected by an edge are likely to have the same label. Under this assumption, LPA propagates labels iteratively along the edges. This process can be formalized by the following equation

where Yᵏ is the soft label matrix after the kth iteration. We initialize row i of Y⁰ to be a one-hot label indicator vector if node vᵢ has a label and otherwise a vector of zeros. We repeat this k times. Before each iteration, we replace rows in Yᵏ with one-hot label indicator vectors for nodes whose labels are given to ensure that known ground-truth labels persist and are not overpowered by the label propagation algorithm.

Fig 1. First Few Iterations of LPA for Binary Classification (Given ground truth labels for node 1,2,3,4, we predict labels for node 5,6,7 via LPA)

GCN

A graph convolutional network (GCN) [3] is a specific instance of a graph neural network (GNN). GNN is a neural network model that modifies the idea of convolutional neural networks (CNN) to work with graph data and generate node embeddings. These node embeddings are used for downstream tasks such as node classification. For those not familiar with CNNs, please refer to this article or this paper. We often cannot directly apply CNNs to graphs because they have no fixed ordering or fixed notion of locality which are essential in CNNs as illustrated in the below figure.

Fig. 2 Applying Convolution to Image vs Graph

A GNN overcomes this issue by using a node’s neighborhood as a computational graph and propagating information according to that computational graph. We give an example below in Fig. 3: for node A in the graph, its computational graph is represented by the diagram on the right. A is connected to its three neighbors B, C, and D, and each of these nodes is connected to their respective neighbors, and so on.

Fig. 3 Example of a computational graph. Image Credit

GCN, being a specific instance of GNN, uses this definition of a computational graph and computes node embeddings in the following steps:

  • Sending messages: each node sends a message (its node embeddings) to each of its neighbor nodes
  • Aggregation: each node takes all of the messages from its neighbors and takes the average.
  • Updating: update the embeddings of each node based on the aggregation output.

The procedure above constitutes one layer in a GCN. Each additional layer adds another level of message passing, so with k layers, nodes that are k-hops away can influence each other’s embeddings. For instance, in Fig. 3, a 2-layer GCN would allow all nodes to influence all other nodes’ embeddings since all nodes in the input graph are 2-hops away from each other, as can be seen in the computational graph.

Using our notation, GCN can be formalized by the equation

where Xᵏ is the matrix of node embeddings at the kth layer and Wᵏ is a learnable weight matrix for the kth layer. Finally, we apply a sigmoid function to map values between 0 and 1.

GCN-LPA

One drawback of GCNs is that the aggregation function treats all neighbors equally (taking the mean), whereas we might want to weight the features of certain neighbors more heavily.

The GCN-LPA model introduces the idea of learning edge weights in addition to learning node embeddings. Specifically, we want to learn edge weights such that nodes from the same class are more strongly connected to each other, thus defining clearer node classes. This intuitively makes sense: a learned node embedding is good if all nodes of the same class have close embeddings, so by weighing intra-class edges (edges between two nodes of the same class) more heavily, we will naturally push them together in the GCN, resulting in better node classification.

Fig. 4 Computational Graph for GCN vs GCN-LPA. Note that intra-class edges are weighted more heavily in GCN-LPA, as denoted by thicker lines.

How do we promote learning edge weights that emphasize intra-class edges though? One of Wang and Leskovec’s [1] key findings was that

“increasing the strength of edges between the nodes of the same class is equivalent to increasing the accuracy of LPA’s predictions”

Thus, the overarching premise for GCN-LPA is that LPA serves as regularization for GCN to learn edge weights. To see the mathematical proof for why this method works, please refer to Section 2 in the original paper.

Now that we have shown that increasing intra-class edge weights is equivalent to increasing LPA accuracy, we can use LPA in conjunction with normal GCN to define GCN-LPA. Simply put, in GCN-LPA, we want to find optimal node embeddings W* and edge weights A* by setting

The two L terms represent the cross-entropy loss functions for GCN and LPA, and λ is a balancing hyperparameter. So in the end, GCN-LPA has the added benefit of being relatively simple to implement.

Applying GCN-LPA to Detect Illicit Bitcoin Transactions

Now that we have covered the GCN-LPA model, let’s try applying it to a real-world scenario: identifying transactions as licit versus illicit on the Bitcoin blockchain. Don’t worry, you won’t need a thorough understanding of Bitcoin to follow along. The most important thing about the Bitcoin blockchain for us is that it can be nicely modeled as a directed graph. The nodes in this graph represent transactions posted to the blockchain, and edges represent the flow of Bitcoin between two transactions.

List of transactions on the Bitcoin blockchain. Image Credit

Our goal is to classify unlabelled nodes in a dataset as either licit or illicit. Since we have framed this as a node classification problem on a graph, we can try using a GCN-LPA model! In the following sections we will introduce the dataset we used and walk through our code applying a GCN-LPA model to tackle this problem.

Dataset

We use the Elliptic Bitcoin Data Set [4], which was contributed by Weber and their colleagues as a part of their paper at KDD 2019 [5]. This dataset can be interpreted as a directed graph in which nodes represent transactions and edges represent flows of Bitcoins from one transaction to another. Below is a visualization of a subset of the dataset represented as a graph:

Fig 5. Visualization of a Subset of Elliptic Bitcoin Data Set. Made with NetworkX.

Each transaction is either licit or illicit. Yellow nodes in the figure represent illicit transactions while purple nodes represent licit transactions. Some transactions in the data set are unlabeled and we remove these nodes from the data set in our pre-processing step. We also remove isolated nodes from the data set. After these steps, the data set contains 35,874 nodes and 36,624 edges. 2,484 nodes are labeled as illicit, which is approximately 7% of the total nodes.

Every node has 165 recorded features as well as a timestamp which tells us when the transaction was posted. There are 49 distinct timestamps in total and each timestamp contains a single connected component of transactions. We use the timestamps to subset the dataset due to resource constraints on Google Colab. Given this, we do not use the timestamps as features when training our models.

For more detailed information about the data set, please refer to Kaggle or the paper.

Code Walkthrough

Now let’s dive into our GCN-LPA model for classifying nodes as licit or illicit. We will be implementing a stacked GCN-LPA model where consecutive GCN layers are connected via batch normalization, ReLU, and dropout. The final output of the GCN-LPA model is passed through a Softmax layer to yield probabilistic estimates of the label.

We won’t be going through all of the code here, only the most important bits. Our full Colab notebook can be accessed here. Shoutout to PyTorch [6] and PyTorch Geometric [7] for powering our code!

Recall that for GCN-LPA, we simply need to implement a GCN component and an LPA component, which are tied together by common edge weight parameters and are updated simultaneously.

We start in the__init__function our for our GCNLPA class defining the GCN component.

We first define all our convolutional layers in lines 5 and 6. Here we lean heavily on PyTorch Geometric’s GCNConv, which already implements our GCN layers for us. We can treat this just like other layers in PyTorch, meaning that it suffices to provide some required information such as the input and output dimensions.

Then in line 7 we define numlayers-1 batch normalization layers so that we can do batch normalization in between all convolutional layers. Finally, we define a softmax function, a dropout percentage, a torch parameter to store our edge weights, and an integer k to represent the number of iterations of LPA we do.

Now let’s move to the forward function, which tells us how to propagate data across our GCN and run an iteration of LPA.

In the GCN, first we pass the node features, the edge index, and the sigmoid of the edge weight into our convolutional layer to get a new node embedding x. Then we do batch normalization, apply ReLU, and pass our x through a dropout layer. We finally apply softmax to the resulting x to squeeze it between 0 and 1.

Next we go to the second part of this forward function which implements LPA.

Here we first initialize the labels as one hot vectors. Next we lean on torch_geometric.utils.to_dense_adj which takes our sparse adjacency matrix given by edge indices and edge attributes and returns to us the appropriate dense adjacency matrix. Then we run k iterations of LPA by repeatedly multiplying our matrix by our labels.

Now we are ready to train the GCN-LPA model. We do 100 trials of training, with each trial following the 3-step training procedure below:

  1. Sample a set of hyperparameters from a predefined hyperparameter space.
  2. Train the model using the sampled set of hyperparameters on the train set.
  3. Evaluate the trained model with the validation set.
  4. Save the model parameters.

Note that we use binary cross-entropy loss for both our GCN and LPA models, and we also set our GCN-LPA loss to be the GCN loss plus lambda times the LPA loss as discussed in the paper. Another quick thing to highlight is that the variable t is the cutoff for a node to be considered fraudulent (if embedding xₐ is greater than t for a node vₐ, then vₐ is classified as fraudulent).

Finally, in the following evaluation stage, we run 100 trials using objective_gcnlpa defined above, and then we choose the trained model with the best validation performance. We evaluate this chosen model on the test set and report the F1 score, test accuracy, precision, and recall.

Hyperparameter optimization done with Optuna

Discussion of Results

The table below summarizes the results in the node classification task into licit vs. illicit transactions. We found that GCN-LPA achieved superior performance in comparison to LPA and GCN (see Fig. 6 below). Recall that GCN-LPA improves GCN by strengthening intra-class edge weights via LPA. Thus, GCN-LPA could have performed better because our data satisfies the assumption that illicit transactions interact more with other illicit transactions.

Fig. 6 Comparing test performance for LPA, GCN, and GCN-LPA

However, it should be noted that the performance difference between GCN-LPA and GCN is relatively small. These results are in-line with what Wang and Leskovec found when comparing GCN-LPA to GCN on five citation/co-authorship datasets [1].

Fig. 7 Comparing test performance for LPA, GCN, and GCN-LPA visually

It is also noteworthy that the nature of the Elliptic Bitcoin Data Set is radically different from citation or co-authorship datasets used in the original paper by Wang and Leskovec. This could imply the versatility of GCN-LPA as a graph ML technique. To further explore this point, it would be interesting to see how GCN-LPA performs on other domains of node classification or how it performs on other tasks such as graph classification or community detection.

Conclusion

Our results demonstrate the power of GCN-LPA for detecting illicit transactions in Bitcoin transaction graphs. Earlier, we introduced LPA and GCN, models designed for classification tasks with graphs. GCN-LPA further improves upon the results of GCN by leveraging the strength of LPA to boost intra-class edge weights. We presented how GCN-LPA can be applied to the Elliptic Bitcoin Dataset, and this was made easy by using the versatility of PyTorch Geometric and PyTorch. GCN-LPA can similarly be applied to a wide variety of problems with graphical data.

If this article sparked your interest in graph ML, we recommend that you watch lectures by Professor Leskovec on YouTube and/or that you try out PyTorch Geometric!

References

[1] Hongwei Wang and Jure Leskovec. Unifying graph convolutional neural networks and labelpropagation.CoRR, abs/2002.06755, 2020.

[2] Zhu, X., Lafferty, J., and Rosenfeld, R. Semi-supervised learning with graphs. PhD thesis, Carnegie Mellon University, school of language technologies institute, 2005.

[3] Kipf, T. N. and Welling, M. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations, 2017.

[4] Elliptic, https://www.kaggle.com/ellipticco/elliptic-data-set

[5] M. Weber, G. Domeniconi, J. Chen, D. K. I. Weidele, C. Bellei, T. Robinson, C. E. Leiserson, “Anti-Money Laundering in Bitcoin: Experimenting with Graph Convolutional Networks for Financial Forensics”, KDD ’19 Workshop on Anomaly Detection in Finance, August 2019, Anchorage, AK, USA.

[6] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., … Chintala, S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32 (pp. 8024–8035). Curran Associates, Inc.

[7] M. Fey and J. E. Lenssen. Fast Graph Representation Learning with PyTorch Geometric, 2019.

--

--

Stanford CS224W: Machine Learning with Graphs
Stanford CS224W: Machine Learning with Graphs

Published in Stanford CS224W: Machine Learning with Graphs

Tutorials of machine learning on graphs using PyG, written by Stanford students in CS224W.

No responses yet