TXN: Transaction Categorisation with Deep(walk)Trax: Embedding Graphs of Financial Transactions— Part One
Many people are in dark when it comes to money, and I’m going to turn on the light.
Background: To my knowledge the method proposed in DeepTrax paper (check below post) is the first application of graph-embeddings to financial transactions. It shares a methodology to construct embedding of user and merchant on a financial graph, further using these embeddings one can perform other downstream tasks like merchant recommendation for rewards (google pay merchant reward scratch card), similar merchants grouping, transaction categorization with meta info on the grouped cluster, fraud use cases, etc.
ABSTRACT
Financial transactions can be considered edges in a heterogeneous graph between entities sending money and entities receiving money.
This graph is likely to be very large have millions of nodes (entities) and billions of edges (transactions) while also sparsely connected. It becomes challenging to apply machine learning to such large and sparse graphs. To simplify our use case and capture only merchant embeddings we choose credit card txns data where the user is a money-spending entity and the merchant is a money receiving entity (in most cases), hence this network is a bipartite graph. In this bipartite graph, graph embeddings techniques: DeepWalk is applied to capture merchant embeddings quantified by link prediction AUC and F1 score. Further, the resulting entity vectors retain intuitive semantic similarity that is explored through visualizations and other qualitative analyses. We then use these merchant embeddings to perform a downstream task like merchant (transaction) categorization.
The Blog is structured in the following manner- Section II discusses the Bipartite Graph and its Projection. Preprocessing to Pair Transactions is discussed in section III. Deepwalk technique on the processed graph in detail is discussed in section IV. Section V discusses the Loss Function. Followed by Some Interesting Experimental Learnings in section VI and the methodology of apply merchant embeddings to a downstream task like Transaction Categorisation in Section VII. Section VIII provides a detailed outcome of the performed experiment and finally concluding remarks.
II. Bipartite Graph and its Projection
The credit card transaction graph is a bipartite graph between accounts and merchants, with a transaction forming an edge. A bipartite graph (or bigraph) is a graph whose vertices can be divided into two disjoint and independent sets in our case Users Set and Merchants Set such that every edge connects a vertex in User Set to one in Merchant Set. (See Fig.1)
In a transaction database (see fig.2), the metapath of the above bipartite graph is stored as {Account, Merchant, Account}, given this schema, we would only consider graph walks that adhere to such triplets: identifying accounts as similar because they shop at the same merchant (and vice versa).
We can break this bipartite graph into two homogeneous projections:
i) a user accounts graph
ii) a merchant graph
For our use case, we will only focus on creating a merchant graph. We can apply the random walk technique on this homogeneous projection of merchant graph, i.e. deepwalk to get merchant embedding.
III. Pre-processing Stage: Modeling a Graph as Pairs of Transactions
In the above-discussed projection graph of a merchant, an edge between two merchant vertices represents the presence of at least one account that made transactions at both merchants within a specified time window (we have set it to 50secs, see fig.3 (a) and (b)).
After transaction pair formulation, we now have a weighted merchant projection graph (see fig.4, projection merchant graph based on above transaction pairing) where weights are calculated based on how often an account shop at the same two merchants within a fixed time window.
IV. Deepwalk (graph embedding technique) on Merchant Projection Graph
First, we pair transactions within a specific time (our case T=50secs) window then we map those creating a projection of merchant graph (b).
To combat non-homogeneity, we use Markov chain methodology which relies only on the present state to generate the next state.
We calculate transition probabilities between merchants, using unique transaction pair occurrence counts.
Then we apply random walk on graph (b) using the transition probabilities generation multiple merchant sequences (c) and then apply skip-gram negative sampling technique (d) (see Fig.5) to find merchant embedding (the similar methodology is used if we want to generate user-embedding).
In our case in Fig.5 instead of word token, we will have merchant account ids as our tokens of the skip-gram model. (this is a modification, I did for my use case, on paper they use raw merchant names or brand names as merchant tokens).
V. Loss Function
The optimization problem and the loss function is given by
where m ∈ V denotes a merchant selected from a set of unique merchant entities, V, the mapping function φ: m ∈ V → Rd retrieves the embedding representation for any given merchant, m_pos denotes a positive sample merchant for a given merchant, and m_neg denotes a negative sample chosen from V and k is the number of negative samples.
We are halfway through now to study and perform the transaction categorization task.
Next, we shall discuss:
- Section VI: Some Interesting Experimental Learnings
- Section VII: the methodology of apply merchant embeddings to a downstream task like Transaction Categorisation
- Section VIII: provides a detailed outcome of the performed experiment, metrics, and finally concluding remarks.
I hope you enjoyed this blog and learned something! I will be back with Part Two soon. [TO BE CONTINUED…]