TXN: Transaction Categorisation with Deep(walk)Trax: Embedding Graphs of Financial Transactions— Part One

Shaurya Uppal
Nerd For Tech
Published in
6 min readMay 27, 2021

Many people are in dark when it comes to money, and I’m going to turn on the light.

Transaction between Fred Flintstone (receiver) and Barney Rubble (sender)

Background: To my knowledge the method proposed in DeepTrax paper (check below post) is the first application of graph-embeddings to financial transactions. It shares a methodology to construct embedding of user and merchant on a financial graph, further using these embeddings one can perform other downstream tasks like merchant recommendation for rewards (google pay merchant reward scratch card), similar merchants grouping, transaction categorization with meta info on the grouped cluster, fraud use cases, etc.

ABSTRACT

Financial transactions can be considered edges in a heterogeneous graph between entities sending money and entities receiving money.
This graph is likely to be very large have millions of nodes (entities) and billions of edges (transactions) while also sparsely connected. It becomes challenging to apply machine learning to such large and sparse graphs. To simplify our use case and capture only merchant embeddings we choose credit card txns data where the user is a money-spending entity and the merchant is a money receiving entity (in most cases), hence this network is a bipartite graph. In this bipartite graph, graph embeddings techniques: DeepWalk is applied to capture merchant embeddings quantified by link prediction AUC and F1 score. Further, the resulting entity vectors retain intuitive semantic similarity that is explored through visualizations and other qualitative analyses. We then use these merchant embeddings to perform a downstream task like merchant (transaction) categorization.

The Blog is structured in the following manner- Section II discusses the Bipartite Graph and its Projection. Preprocessing to Pair Transactions is discussed in section III. Deepwalk technique on the processed graph in detail is discussed in section IV. Section V discusses the Loss Function. Followed by Some Interesting Experimental Learnings in section VI and the methodology of apply merchant embeddings to a downstream task like Transaction Categorisation in Section VII. Section VIII provides a detailed outcome of the performed experiment and finally concluding remarks.

II. Bipartite Graph and its Projection

The credit card transaction graph is a bipartite graph between accounts and merchants, with a transaction forming an edge. A bipartite graph (or bigraph) is a graph whose vertices can be divided into two disjoint and independent sets in our case Users Set and Merchants Set such that every edge connects a vertex in User Set to one in Merchant Set. (See Fig.1)

Fig.1 Bipartite Graph of Credit card transactions

In a transaction database (see fig.2), the metapath of the above bipartite graph is stored as {Account, Merchant, Account}, given this schema, we would only consider graph walks that adhere to such triplets: identifying accounts as similar because they shop at the same merchant (and vice versa).

We can break this bipartite graph into two homogeneous projections:
i) a user accounts graph
ii) a merchant graph

For our use case, we will only focus on creating a merchant graph. We can apply the random walk technique on this homogeneous projection of merchant graph, i.e. deepwalk to get merchant embedding.

Fig.2 Transactional Database

III. Pre-processing Stage: Modeling a Graph as Pairs of Transactions

In the above-discussed projection graph of a merchant, an edge between two merchant vertices represents the presence of at least one account that made transactions at both merchants within a specified time window (we have set it to 50secs, see fig.3 (a) and (b)).

Fig.3 Model pipeline, from data pre-processing to training with Skip-gram. Using stringent time windows and pairs of transaction pairs (example time window, t = 50 seconds) allows for creating meaningful embeddings on graphs with millions of unique entities.

After transaction pair formulation, we now have a weighted merchant projection graph (see fig.4, projection merchant graph based on above transaction pairing) where weights are calculated based on how often an account shop at the same two merchants within a fixed time window.

Fig.4 Merchant Projection Graph for (Fig.3)

IV. Deepwalk (graph embedding technique) on Merchant Projection Graph

Fig.5 Graphical representation of the stepwise implementation

First, we pair transactions within a specific time (our case T=50secs) window then we map those creating a projection of merchant graph (b).

To combat non-homogeneity, we use Markov chain methodology which relies only on the present state to generate the next state.

We calculate transition probabilities between merchants, using unique transaction pair occurrence counts.

probability of going from merchant i to j. M is unique transitions. 0 if no transaction has navigated from merchant i to j (in our specified time window T).

Then we apply random walk on graph (b) using the transition probabilities generation multiple merchant sequences (c) and then apply skip-gram negative sampling technique (d) (see Fig.5) to find merchant embedding (the similar methodology is used if we want to generate user-embedding).

Fig.5 Negative Sampling on Word Token (image credit jay alammar blog)

In our case in Fig.5 instead of word token, we will have merchant account ids as our tokens of the skip-gram model. (this is a modification, I did for my use case, on paper they use raw merchant names or brand names as merchant tokens).

Fig.6 Phases of Deepwalk

V. Loss Function

The optimization problem and the loss function is given by

Fig.7 Loss Function

where m ∈ V denotes a merchant selected from a set of unique merchant entities, V, the mapping function φ: m ∈ V 􏰀→ Rd retrieves the embedding representation for any given merchant, m_pos denotes a positive sample merchant for a given merchant, and m_neg denotes a negative sample chosen from V and k is the number of negative samples.

We are halfway through now to study and perform the transaction categorization task.

Next, we shall discuss:

  • Section VI: Some Interesting Experimental Learnings
  • Section VII: the methodology of apply merchant embeddings to a downstream task like Transaction Categorisation
  • Section VIII: provides a detailed outcome of the performed experiment, metrics, and finally concluding remarks.

I hope you enjoyed this blog and learned something! I will be back with Part Two soon. [TO BE CONTINUED…]

--

--

Nerd For Tech
Nerd For Tech

Published in Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit https://www.nerdfortech.org/.

Shaurya Uppal
Shaurya Uppal

Written by Shaurya Uppal

Lead Data Scientist @ The Weather Company | ex-Makro, Inmobi, 1mg, epiFi | https://www.linkedin.com/in/shaurya-uppal/ | Shauryauppal97@gmail.com