An overview of graph neural networks for anomaly detection in e-commerce

Part 1. An introduction to using network information in e-commerce transactions

Ahmad Khodayari
Walmart Global Tech Blog
9 min readNov 3, 2021

--

The graph and network mentioned in this article is a representation of relationships or interactions between entities (customer accounts, devices, payment methods, etc.) involved in a typical online shopping or e-commerce environment. Graphs help capture connected data between entities in addition to transaction specific features that are fed into machine learning models used for various tasks. This is particularly useful for anomaly detection as fraudsters or abusers tend to exploit multiple entities and customer accounts simultaneously. A graph neural network (GNN) adds to the benefit of using graphs by applying neural networks and deep learning techniques within the graph learning and inference process.

In Part 1 of this blogpost (current article), we overview terminology used in graph and network analysis methods and provide a short introduction to the application of these methods in e-commerce. Then in Part 2, we review a few popular graph-based methods and GNN techniques that can be used in various applications, including anomaly detection in e-commerce.

This article is an addition to a previous blogpost (by the author), entitled “Deep learning for fraud detection in retail transactions”, and here we intend to discuss about coupling graph and network information with deep learning techniques toward a more rich exploration of information hidden in e-commerce and retail transactions.

What is the graph information in e-commerce?

Various entities are involved in an e-commerce environment. A transaction represented as a graph is comprised of interaction between a customer account, a device used to make the purchase, a payment method, addresses, etc.

Figure 1. A sample graph of interactions in an e-commerce environment. Note that in general, the node interactions have a time component. Therefore, each edge could represent multiple interactions between two nodes at various points in time. For simplicity, not all nodes or entities involved in an e-commerce are shown here.

A customer account may be connected to several entities, and the relationship between the customer account and other entities, which in turn are connected to a different set of customer accounts (see Figure 1) are useful toward anomaly detection, where the goal is to reduce loss due to fraud and abuse. Graph representation and GNN reasoning are helpful to anomaly detection by efficiently exploiting the local (the level 1 or neighboring) network information of a transaction and further exploring a more global information (higher level connections and distant neighbors).

A graph consists of nodes and edges. When nodes or entities are of the same type, it is called a homogenous graph. On the other hand, a graph with multiple types of nodes and/or multiple types of edges between them is called a heterogeneous graph (or heterograph). E-commerce transaction graph is a heterogeneous graph, in which a transaction can be represented with connections among various kinds of nodes representing a customer account, a payment method, a device, address, etc., and the edges between these nodes vary and capture different types of information. For example, the relationship between a customer account and a device is different from the type of relation between a customer account and a payment method.

We need to consider the time dimension in an e-commerce graph as well. This is because e-commerce transactions are different each time, and the anomaly patterns or MO (modus operandi) are evolving over time. Therefore, a dynamic graph (as opposed to a static graph) is more capable of grasping the changing graph topology, interactions and node attributes. Some examples of the evolving nature of the graph are as follows: a new customer makes a purchase, a new device is used, a different payment method, different device, or a different address is used.

There is an inherent delay in getting performance data and feedback, including true labels for anomaly detection. For example, it takes time for a legitimate customer to identify and hence report unauthorized transactions from their credit card history. Such reports are then sent from the bank to the retailer. Retailers are charged back the funds with a fee which results in loss due to fraud. Such information is stored as attributes within specific graph nodes.

Let’s consider two ways to prepare data for building graphs or performing graph analysis:

  • Graph is mostly available. Assume an e-commerce management system is already keeping track of interactions occurred between a set of selected entities in a transaction. Then we can use the entire graph or apply some form of random sampling over edges (using a region growing algorithm) to search for the neighboring nodes. This produces effective subgraphs, and then we need to generate graph embeddings for further analysis.
  • When graph is not readily available for graph analysis, one might use available databases (such as relational tables) to identify the building blocks (nodes and edges) and hence build the graph.

Problem definition in graph analysis

Assume that most historical data or past transactions (each translated to nodes and edges of a graph) have labels. Once the graph is formed, the next step in graph analytics is:

  • Node classification or “entity classification”: Objective is to find out if a node of a given graph is an anomaly.
  • Link classification or “edge classification”: To find out if an interaction between nodes is anomalous.
  • Link prediction: As a subsequent result of graph analysis, we could assign predicted or projected labels to each node or edge of the graph. We can also use such predictions for the recovery of missing facts or labels. In “link prediction” we want to predict which nodes are likely to be linked or connected. A missing connection between nodes may also be discovered or recovered using this method.
  • Graph classification: To find out if a graph or a subgraph of an e-commerce transaction (including the node and it’s set of neighbors up to a certain neighborhood degree) is of a certain class. To do this, the output decision function could be a neural network classifier, for example, which uses graph attributes and node embeddings as input.

Related topics and methods

To have a comprehensive overview of graph analysis techniques, we will first review other related terminology and methods:

  • Network measures and graph feature extraction”: The topic includes using traditional graph and network attributes or measures like node clusters, characteristic path-length, node degree, mean clustering coefficient, local efficiency, PageRank centrality. These are useful numerical graph and network features which can be used along other attributes as input to a traditional machine learning model. (see this, this and that)
  • Graph mining” or “graph knowledge discovery”: It is a general topic and could mean various kinds of data analysis, learning and inference methods and procedures over graph data to extract useful information. One of such methods in graph mining is called “label propagation” (which is similar to entity prediction) in which we start with known node labels, and then use the graph structure to identify suspicious nearby neighbors that are not detected as anomaly yet. Another thread of this topic is finding repetitive subgraphs, and useful patterns.
  • Graph clustering: Since dense clusters highly correlate with abnormal activity, we could use graph “density clustering” for anomaly detection. “Community detection” is a similar method which is used for grouping a graph into densely connected subsets; in other words, it is used for identifying densely linked clusters of nodes.
  • Graph similarity estimation: For a given graph, “similarity ranking” is the problem of predicting how similar two nodes in a graph are. Predicting a link between two nodes in a graph is an application. Likewise, detection of “network similarity” is finding how similar two sub-networks (or subgraphs) are.

Graph topology vs attributes

Other than the explicit graph nodes and edges that we introduced before, a transaction also has other numerical features that are useful for anomaly detection. In graph analysis, such information is also called the graph metadata, auxiliary information or graph attributes which are different from “graph topology” or structure. Overall, not every useful graph attribute demonstrates an explicit relationship to other entities or to other transactions. For example, transaction dollar amount, transaction label, and items or products purchased (in which each product has a unique UPC or a GTIN number). Let’s consider three simple options to deal with such attributes:

  • One way to deal with these attributes is to discretize each of them if needed and then to consider them as additional graph nodes. However, a drawback of this approach is that it will make the graph structure very big, complex and computationally expensive. For example, there are typically a huge number of items or UPCs sold at an e-commerce site, and even though the relationship information of UPCs to other entities in the graph is potentially useful, considering them as additional graph nodes increases the total graph size.
  • Another way is to incorporate such attributes as node features, and apply a fully heterogeneous graph representation and analysis, in which each node has its own attributes. We can use some of these attributes as edge features as well.
  • The third way is to use such attributes at the very end of the detection process (e.g., when using a GNN for the graph classification purpose, we can use these attributes as additional inputs to the decision-making model where final decision is made based on node features).

Graph embedding, or “graph representation learning”

Graphs contain discrete information (nodes and edges), but most machine learning (ML) methods including neural network (NN) techniques operate on or prefer continuous-valued input data. To address this issue and more importantly to allow a suitable way for propagation of interaction information across graph, ‘graph embedding’ is used which refers to generation of continuous representation of graph information combined with node attributes.

Let’s start with a couple of terminology and an assumption. First, “k-hop neighborhood” or neighborhood of radius k of a node is a set of neighboring nodes at a distance less than or equal to k. Next, let’s define the subgraph: The neighborhood subgraph of radius k of a node is the subgraph induced by the neighborhood of radius k of the node and the node itself. The main assumption in graph representation is that there is no reference point and no node ordering.

By graph embedding, the high-dimensional graph information (the structure as well as other numerical features including node attributes) is mapped into a lower-dimensional space. We start with a graph, G = (V, E), where each node in the set V has some features, and E is the set of edges. Basically, we first could concatenate all node attributes or features into a large numerical vector, and then use a mapping or transformation function (like a NN or a GNN) to find a lower-dimensional latent vector to encode the node and network information. By representation learning, each and every node is mapped into a K-dimensional embedding vector such that similar nodes in the graph are close to each other in the K-dimensional embedding space. For finding the embedding vector, we propagate and aggregate information across the graph, where at the end the embedding vector for each node contains and encodes its own information as well as the network information from all its neighboring nodes.

Graph embedding methods can be divided into several categories including matrix factorization-based, graph traversal and random walk-based, and neural network-based methods. See Perozzi, et al. (2014), and the reviews by Goyal and Ferrara (2017), Cai, et al. (2018), Mengjia Xu (2020), and Wu, et al. (2020).

Once we have a numerical feature vector assigned to each node, we perform an aggregation or transformation (like averaging) using k-hop neighborhood of each node to create a context-embedding. The aggregation function operates over a set of neighboring nodes and must be permutation invariant (i.e., insensitive to node ordering).

Figure 2. Each node of the graph is represented by a feature vector or embedding vector.

Summary of Part 1

Using graph embeddings and GNN methods for anomaly detection, abuse and fraud detection allows us to extract and feed valuable interaction network and graph data as inputs to the machine learning model. Such ML methods employing connection data could help improve the detection efficacy compared to conventional methods. In this article we looked at how e-commerce network data carries useful information for performing anomaly detection tasks. In Part 2 of this blogpost, we will continue the discussion, briefly introduce a few popular graph embedding and GNN methods, and touch on the challenges toward deploying such methods in real-world problems.

I would like to thank Henry Chen, Priya Venkat and James Tang (of Walmart) for providing their thoughtful comments, which helped to improve this article.

--

--