Application of Graph Convolution Algorithms at PayPal

Venkatesh Ramanathan
The PayPal Technology Blog
6 min readNov 18, 2020
Photo by Sander Weeteling on Unsplash

Motivation

At PayPal, we apply large-scale machine learning to various problems such as detecting fraudulent activity, providing credit to merchants and consumers and building marketing and recommendation solutions. We have successfully employed a variety of graph algorithms such as connected components, clustering and centrality algorithms to unearth graph intelligence and feed it to our machine learning pipeline. At PayPal, we proactively research and employ state-of-the-art algorithms to continually improve performance of ML models. In this research, we explore the application of state-of-the-art advancements in graph machine learning research namely graph convolution. In order to apply this recent advancement that enables end-to-end machine learning on graphs and to scale to large volume of PayPal data, we developed the entire stack that included specialized hardware and software that is optimized for AI applications. Our proposed approach is to represent customer-merchant interaction data in a graph form onto a specialized graph storage as and when interaction events occur. This allows us to query the graph much more efficiently as the data can be indexed in specialized graph data storage which, in turn, makes the data readily available for algorithmic consumption. Next, a brief overview of the graph convolution algorithm is presented.

Graph Convolution Algorithm Overview

Graph Convolution Networks (GCNs) are based on the idea of message passing in a graph [1][2]. Message passing is a powerful concept as a lot of the graph algorithms can be interpreted and implemented efficiently in such a paradigm. The idea is that a node in a graph can send and receive messages along its connections with its neighbors. This is typically accomplished in two steps. First, nodes will send out a message about itself to its neighbors. Next, nodes collect messages received from its neighbors and use them to update themselves.

GCN is a simple message passing algorithm similar to well-known graph algorithms such as the label propagation algorithm. Unlike the label propagation algorithm, wherein only the label is passed as messages along edges to its neighbors, GCNs do this for an entire vector of input data associated with a node. Thus, if the label propagation is thought of as a label smoothing algorithm, GCNs can be thought of as a feature-smoothing algorithm.

In a GCN, first a node in a graph gets all of the feature vectors of its connected nodes and then applies an aggregate function to the vectors (such as averaging). Aggregation ensures that the node representation is of the same size regardless of the number of neighbors it receive messages. It makes intuitive sense that the representation of a node can be inferred using its features and the features of its neighbors. Next, this average vector is passed through a dense neural network layer. The output of the dense layer is the new vector representation of the node. This process is repeated for every node in the graph. For the second layer of GCN, the input is the new updated representation from the first GCN layer and the process is repeated. This is similar in concept to a fully connected neural network except that there is an additional step at the beginning of each layer where a node needs to get values of all its neighbors and aggregate them.

The GCN model is trained to predict the probability that there is an edge between pairs of nodes in the graph. The training process optimizes a loss function that maximizes the probability of two nodes that are actually connected in the graph and minimizes the probability of disconnected nodes. The node representations obtained from the training process are then used to predict the probability that a connection between two nodes exists. GCNs require only a fixed number of parameters and so it does not depend on the size of the graph. Thus, GCNs are scalable to very large graphs. In addition, the scalability can be further enhanced if the neighboring nodes are sampled when obtaining the representation of a specific node. Also, GCNs are inductive and hence a representation can be obtained for a newly added node by using its basic features and connections in the graph. Relational GCN (R-GCN) expand GCNs to heterogeneous graphs where nodes and edges of different types exist within the same graph.

Task Agnostic Application of Relational GCN

The graph representation of anonymized customer merchant interaction data is heterogeneous i.e, we have different vertex types (such as account number, merchant ID and IP address) and there are multiple edge types (such as send money and receive money) between account vertices and signup, login, etc between account-IP vertices. Traditional GCN doesn’t take into account such heterogeneous graphs. Relational GCNs (R-GCN) are developed specifically to deal with such highly multi-relational realistic graphs [3].

Task Agnostic RGCN Application

Our goal is to apply R-GCN is to solve more than one application use case. The embedding vector produced as output by R-GCN gives us the flexibility to do exactly that. We explored three use cases to see if R-GCN can provide business value to them.

Link Prediction

One of the use cases involves determining if accounts are linked when there is no explicit edge between them from interaction data.

Recurring Subgraph Detection

The second application we explored is to detect recurring subgraphs (also known as temporal motifs) in our data. One example use of this helps us to determine how often our users interact in a social setting as a group and what are the sizes of such groups. We could use such pattern to predict the next such group gathering and offer services such group discounts for such events.

Feature Generation using Embeddings

The last application we explored was the use of the GCN embeddings as additional features in one of our ML models namely collusion fraud. In collusion fraud, the buyer and seller collude to financially benefit from our buyer and seller protection policies. The intuition for using the embeddings as features in collusion fraud is the potential benefit of using graph intelligence to detect complex fraud patterns that exploit the graph structure.

The entire pipeline from data preparation, model training and evaluation was done in our in-house, large-scale AI/ML platform consisting of high-performance compute and storage clusters. For the purpose of R-GCN training, we utilized the open source software Deep Graph Library [4] which has the widest and most scalable implementation of GCN algorithms. The model training and evaluation was done using sampled one-year interaction data that accounted for billions of vertices and edges. Results from our experiment showed that we were able to link up to 60% of the accounts in the ground truth data using similarity computed between embedding vectors. We also saw promising results in the detection of recurring social interactions. Lastly, we saw an incremental 7% performance improvement using embeddings as additional features in our collusion model.

Future Direction

As seen from our work, relational GCNs show tremendous value for a variety of business use cases at PayPal. One of the algorithmic advancements that we are exploring now is accounting for both time and space in building a spatio-temporal GCN [5]. For certain use cases like detecting anti money laundering activities, we not only need to account for temporal changes but also how geographically distributed nodes in our graph are. This requires design of a new graph data model and a specialized loss function that gives different importance to spatial proximity and recency.

If interested in knowing more about our work on graphs, please feel free to reach out in the comments.

References

  1. William L. Hamilton, Rex Ying and Jure Leskovec: Inductive Representation Learning on Large Graphs. NeurIPS 2017.
  2. Thomas N. Kipf and Max Welling: Semi Supervised Classification With Graph Convolutional Networks. ICLR 2017.
  3. Michael Schlichtkrull et al: Modeling Relational Data with Graph Neural Networks:.https://arxiv.org/pdf/1703.06103.pdf. Oct 2017.
  4. Deep Graph Library. https://docs.dgl.ai/index.html.
  5. Bing Yu, Haoteng Yin and Zhanxing Zhu: Spatio-Temporal Graph Convolutional Networks: A Deep Learning Framework for Traffic Forecasting. IJCAI 2018.

--

--