Consumer Debt Delinquency Detection on DGraph-Fin

By: Harry Ko and Khalid El-Awady, for Stanford CS224W course project Winter 2023

Kelawady
Stanford CS224W GraphML Tutorials
11 min readMar 30, 2023

--

Colabs:

  1. Main colab: Here
  2. Motif generation: Here

Overview and Motivation

Predicting, detecting, and managing delinquencies is the key to successful credit lending [1]. Delinquency refers to late payments on debts — typically 30 days or more after the due date — with `serious’ delinquency referring to payments more than 90 days late [2]. TransUnion forecasts serious US
consumer credit card delinquency rates 2.60% at the end of 2023, and those of unsecured personal loans of 4.30% in the same time frame [3]. The New York Fed reports that at the end of Q3, 2022, more than $150B worth of US consumer debt was in serious delinquency [4].

We construct a graph neural network (GNN) to predict delinquencies using the DGraph-Fin dataset [5]. The dataset is a large scale, real-world dataset derived from actual consumer lending data that has been created by the authors to further enhance GNN research. The authors provide a leaderboard on their website with baseline performances of general Graph Neural Networks and Graph Anomaly Detection methods. Our model employs a sequence of improvements starting with a standard GCN baseline that results in a very competitive model that places us second on the leaderboard, not far from first place and comfortably ahead of the (now) third place model.

The DGraph-Fin Dataset

DGraph-Fin is a dynamic, publicly available graph dataset that is derived from delinquency data provided by FinVolution Group [6], a pioneer in China’s online consumer finance industry, which had more than 140 million registered consumers and 14 million borrowers during fiscal year 2021 with a total transaction volume of RMB 137.3B (∼ $21.5B).

FinVolution borrowers are required to provide an emergency contact. Sometimes these contacts are other FinVolution borrowers and sometimes they are not. In the latter case they are referred to as ’background’ users. Together these borrowers and background users form the nodes of the graph in Dgraph. The graph is a directed one, with the emergency contact representing a directed edge from borrower to contact. In total the graph comprises 3.7M nodes and 4.3M directed edges. 32% of the nodes pertain to borrowers and the remainder are background nodes.

Borrower nodes have a 17-dimensional feature vector derived from profile information submitted at application time, such as age and gender, though the actual data is masked to preserve privacy. Roughly half of the feature vector values are missing. Amongst the borrower nodes, 1.27% are labeled as delinquent (i.e., have missed a payment over the study period). In the dataset delinquent nodes are also referred to as ’fraudulent’ nodes and we will use the terms fraud and delinquency interchangeably here.

The figure below shows a summary of some of the key statistics of the graph. We see that the graph has low average degree with 2.1 edges per node.

Basic network statistics for DGraph-Fin.

A visualization of a portion of the subgraph is shown below. The full graph is too large to plot, so only a sampling of the nodes and edges is given. The graph comprises numerous background nodes that sometimes act as contacts but often don’t. Also, the average degree of nodes is small so there do not appear to be many edges between the sampled nodes.

Subgraph of DGraph-Fin. The graph is characterized by numerous ‘background’ nodes that sometimes act as contacts and sometimes don’t. The graph also has a small (~1.3%) portion of delinquent / fraudulent nodes shown in red.

Modeling Approach

We take the following modeling approach:

  1. Investigate an appropriate GNN architecture and tune it to optimize node label prediction accuracy (as measured by ROC-AUC score on the test set).
  2. Investigate improvements in accuracy through pre-processing using feature engineering and augmentation.
  3. Investigate the achievable improvement in accuracy from post-processing using correct-and-smooth.
  4. Present the final model and compare its performance to the current DGraph leaderboard.

The main evaluation metric is the ROC-AUC score on the test set. We will refer to this score with the generic term ‘accuracy’. Below we show a screen shot of the current state of the leaderboard. The earliest attempt involves a basic GCN-based GNN and shows node-level prediction accuracy around 70%. The current state of the art (achieved in Nov, 2022) shows accuracy of 84.6%. We use these as benchmarks for comparison in evaluating the performance of our models.

Leaderboard for DGraph-Fin at time of writing (March 2023).

GNN Architectures

GCN

We begin with a baseline model using the GCN structure we explored in Colab 2 of CS224W, which employs the GNN architecture shown below.

General GCN model architecture (credit: here).

The Graph convolutional network layer updates node embeddings by aggregating information from neighboring nodes as defined by the computational graph. Initially, the node features represent the node embeddings. At every layer, information from the immediate neighbors is passed along the edges and aggregated to construct the updated embedding for the target node. The final embeddings are then used for downstream prediction tasks such as node classification. These are shown in the figures below.

Conceptual approach to graph convolutional networks (source: [12]).
Main update equation for node embeddings in a GCN layer.

Using standard deep learning tuning techniques we optimize the performance of the model on test-set node label prediction accuracy. Our optimal architecture has the following hyper-parameters:

  • Number of GCN layers: 5
  • Hidden dimension: 64 (recall the input has feature dimension of 17)
  • Dropout probability: 0.3
  • Learning Rate: 0.01
  • Num Epochs: 1000

With this basic architecture we achieve a performance of 75.28%. This places us in 7th place on the leaderboard. We attribute this performance improvement to the proper tuning of hyper-parameters.

SAGE

Next we employ the GraphSAGE model using similar hyper-parameters. GraphSAGE follows a similar approach as GCN. The main difference between both methods is that in addition to aggregating information from neighboring nodes in the computational graph, GraphSAGE also incorporates information from the target node’s own embedding from the previous layer, a skip connection to itself.

Main update equation for node embeddings in a Graph SAGE layer.

We see that our model performance improves to 78.58%, raising us to 2nd place in the rankings. Since this model architecture seems to perform well without any pre or post processing, and noting that the leader also employs a SAGE-based architecture, we now turn our attention to investigate improvements achievable from pre- and post-processing rather than exploring other potential GNN architectures.

Pre-Processing: Feature Augmentation

The default 17 node features of the dataset are described as traditional demographic-type features: age, gender, etc., though these are anonymized for privacy reasons. But the dataset also includes some additional edge features comprising a timestamp representing when the edge was formed, and an anonymized edge feature which is one of 11 possible edge types. Before augmenting these features we explore a novel feature of our own.

Intuitively, augmenting node features with network-topographic information would seem a fruitful area of investigation. Fraudulent behavior has been shown to exhibit homophily in numerous settings [9, 10]. As a quick sanity check, we augment the feature set with the node degree as shown below and compute an accuracy improvement on our SAGE model of the order of ~1%. This motivates the following deeper exploration of the inclusion of network topography measures in our feature set.

Code for augmenting features with node degree.

Motifs: A Novel DGraph-Fin Feature Set

We introduce the use of motifs as a novel feature which has not been tried in the DGraph-Fin environment before. These are unique subgraphs of a certain neighborhood size anchored at a given node. Since the average node degree is close to 2 and for simplicity of computation we begin with 3 node motifs and enumerate below all possible variations where one node is designated the ‘anchor’ node of interest, and the other two nodes are not distinguished. This generates a set of 21 possible motifs.

Unique 3-node motifs for a directed graph with a defined anchor node and two neighbors where the neighbors are not distinguished. Here we enumerate all 3-node patterns in a directed graph and assign a label. Labels in gray denote non-unique patterns based on undistinguished neighbors.

For our augmented features we assign to each node a vector of length 21 (one entry for each motif) where each element is the count of the occurrences of the corresponding motif index anchored at that node. The code for this is given below.

Code for creating the motif dictionary
Code for getting motif pattern of a specific 3-node subgraph anchored at a specific node.
Code for generating the motif feature matrix consisting of counts of motifs anchored at each node.

To understand the potential expressiveness of these motifs as a feature we plot the histogram of motifs for each of the two node types: fraud and non-fraud, as shown below. We see a clear difference in prevalence of certain patterns. In particular, anchor nodes with predominantly incoming edges have a higher prevalence among fraud nodes (for example, motif “0” is 2.3x more prevalent in fraud nodes than non-fraud). Nodes with predominantly outgoing or two-way edges on the other hand are more common in non-fraud nodes(for example motif 18 is 40% more prevalent in non-fraud nodes). Noting that edges in our graph indicate whether a user is acting as a ‘contact’ (i.e., reference) for another borrower, this suggests that fraudulent borrowers do act as contacts disproportionately more often for other borrowers.

Motif histogram for fraud and non-fraud nodes showing a higher prevalence of pre-dominantly incoming edge motifs for fraud nodes and more outgoing or two-way edges for non-fraud nodes.

We test the improvement in node prediction accuracy with the augmented motif features and see a 1.8% improvement in accuracy to 80.38% which is a significant improvement and nearly twice that achieved by simply using node degree alone.

Edge Feature Augmentation

The dataset provides additional edge features which represent characteristics of the relationship between the borrowers and their emergency contacts. We use two sets of edge features. The first is a temporal feature which specifies the time span measured in days after which each edge was added (meaning at what time the borrower added the emergency contact to his/her contact list).

The second edge feature is an anonymized edge type feature where each relationship is classified into one of 11 categories. We include the sum and maximum for the temporal feature and counts for each relationship type across all edges involving a particular node as additional node features. Inclusion of these features increases our accuracy by an incremental 1.73% to 82.11%.

Post-Processing

We employ the correct-and-smooth technique explored in class with the implementation shown below. Correct & Smooth is a post processing method that adjusts predicted label probabilities using the graph structure and the ground truth labels of surrounding training nodes. The key idea is that prediction errors are correlated along the edges of the graph and that connected nodes tend to share the same label.

Correct & Smooth consists of two steps. In the Correct step, the training errors of node label predictions are diffused along the edges. Training error refers to the predicted class probabilities as compared to the ground truth labels. In the Smooth step, the predictions from the first step are further smoothened along edges with the ground truth labels of neighboring training nodes.

Code for apply correct-and-smooth post-processing.

Tuning the parameters of the method provides some small improvement of approximately 0.1%. This improvement is small, but we retain it nonetheless.

End-to-End Model and Comparison to Leader

We summarize the end-to-end process and accuracy gains in the waterfall image below. This shows the impact of our design choices starting with our baseline GCN model prediction accuracy of 75.28%.

  • Model architecture improvements: +3.3%
  • Pre-processing — motif feature augmentation: +1.8%
  • Pre-processing — edge timestamp and class feature augmentation: +1.73%
  • Post-processing — correct & smooth: +0.1%
Waterfall improvements in node label prediction accuracy for our mode relative to state-of-the-art leaderboard.

Our final accuracy of 82.16% compares quite favorably with the currently leaderboard, placing us second behind the leader at 84.6% (2.44% behind), but comfortably ahead of the (now) third place model of 78.38% (3.8% lead!)

While we believe we achieved impressive performance, our model accuracy still lags the leader by about 2.5 percentage points. Investigation of the leader’s code shows the main difference lies in their incorporation of edge embeddings into the message passing function of their SAGE network. We would characterize this as both an architectural difference as well as additional feature augmentation. Incorporating those into our model as well would be an obvious next step.

Conclusions

GNNs are a natural fit for financial environments where relations between entities are a strong indicator of performance, for example in our case the relations between borrowers and their contacts. The models show strong predictive power. Our study also shows that feature engineering on both node and edge attributes can enhance performance significantly. Another avenue to explore is enhancing our graph’s financial relations with the abundance of other social relational data between the same entities, which is becoming steadily easier to access.

A couple of challenges, though, need to be addressed for use of these models in practice:

  1. These predictive models utilize data accumulated over time. In practice, say in our consumer lending example, it is often the case that we are trying to decide to lend to a new borrower with limited information at the onset, and once the decision is made there is little ability to retract the loan if the borrower appears riskier later.
  2. We classify nodes into a harsh delinquent / not delinquent (or fraud / non fraud) pair of classes. In reality there are degrees of delinquency, and some who go delinquent do make good on payments later in full or partially. So a more continuous measure of delinquency may be better suited for characterizing these nodes.

References

[1] Skibicki, A., and Kropp, E. Credit Portfolio Management — The Ultimate Guide. Experian, 2022.

[2] Maxwell, T. What is a Delinquency on a Credit Report? Experian, 2022.

[3] More Pronounced Changes Expected in Consumer Credit Market in 2023 Even as More Than Half of Americans Remain Optimistic About Their Financial Future. Transunion, 2023.

[4] Total Household Debt Reaches $16.51 trillion in Q3 2022; Mortgage and Auto Loan Originations Decline. Federal Reserve Bank of New York, 2022.

[5] Huang, X., Yang, Y., Wang, Y., Wang, C., Zhang, Z., Xu, J., and Chen, L. “DGraph: A Large-Scale Financial Dataset for Graph Anomaly Detection.’’ The Arxiv 2207.03579, 2022.

[6] FinVolution Group (2021). Form 20-F Annual Report.

[7] Huang, Q., He, H., Singh, A., Lim, S. and Benson, A. “Combining Label Propagation and Simple Models out-performs Graph Neural Networks.’’ The Arxiv 2010.13993, 2022.

[8] Xu, K., Hu, W., Leskovec, J. and Jegelka, S.” How Powerful are Graph Neural Networks?’’ The Arxiv 1810.00826, 2019.

[9] B. Baesens, V. Van Vlasselaer, W. Verheyen. “Networks vs. fraud: Connecting the dots.Analytics, 2015.

[10] Barone, M. & Coscia, M., 2018. Birds of a feather scam together: Trustworthiness homophily in a business network. Social Networks , 54 (July 2018).

[11] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, AND U. Alon “Network Motifs: Simple Building Blocks of Complex Networks.” Science, 25 Oct 2002, Vol 298, Issue 5594, pp. 824–827.

[12] W. Hamilton, R. Ying, and J. Leskovec, “Inductive Representation Learning on Large Graphs,’’ ArXiv:1706.02216, 2018.

--

--