Streamlining transaction categorisation at scale — Part 1

Mauriciotorob
4 min readJul 10, 2024

--

Introduction

Cheddar, since its launch in 2022, has aimed to transform the banking landscape by democratising rewards and simplifying payments. Our commitment to offering financial solutions helps customers manage their money effectively. A standout feature is our cashback program, lauded for its simplicity and instant rewards on everyday purchases, earning us high praise and contributing to our recent award win.

Customer satisfaction is paramount at Cheddar. Winning the ‘Best Newcomer’ award in 2024 reflects not just our product innovation but also the trust and loyalty of our users. We prioritise listening to customer feedback to evolve and enhance our services.

Looking ahead, this award marked just the beginning of our journey. We remain dedicated to making banking more accessible and rewarding. Exciting plans are in store, including new features to elevate the Cheddar experience. One of the new features that we will launch soon is a Personal Finance Manager.

A Personal Finance Manager (PFM) is a software that helps individuals manage their finances; it’s like a digital assistant for your money. At Cheddar, we embarked on a mission to create a PFM that works seamlessly for users across various bank accounts. Unlike some PFMs exclusive to specific banks, our PFM is designed to be versatile and subscription free, ensuring that users can optimise their personal finances and maximise savings in categories like food and drinks, fashion and transportation.

At the heart of our PFM development, there is transaction categorization. This involves automatically classifying bank transactions into categories like entertainment, clothing, or food. Our collaborative effort utilises machine-learning models, Natural Language Processing (NLP) techniques, and an Artificial Intelligence (AI) language model to categorise transactions. In the following, we will introduce the implementation journey of transaction categorization in four parts.

The Four Parts of the Implementation Journey of Transaction Categorisation

Part 1 — User-Centric Insights and Transaction Type Mapping

  1. User Empathy: Understanding user needs, identifying key spend categories such as shoes.
  2. Transaction Type Mapping: Associated transaction types with spend categories.

Part 2 — Data cleaning and mappings

  1. Merchant Name Extraction: Extracted merchant names for card payments and direct debits.
  2. Retailer Mapping: Linked merchant names to common retailers like The Trainline and Just Eat.
  3. Category Code Mapping: Defined a mapping of merchant category codes into spend categories.

Part 3 — Adding AI to make it sing

  1. Numerical Embedding: Selected a mapping translating a merchant name into numbers.
  2. Machine Learning Model: Developed a machine learning model to predict categories for retailers lacking merchant category codes or having an uncommon retailer.
  3. Human Testing: Conducted human testing to validate model predictions and ensure accuracy.

Part 4 — Going to production

  1. API Creation: Integrated all components into an Application Programming Interface (API).
  2. Cloud Deployment: Deployed the API in the cloud, making it accessible to the Cheddar app.

Part 1 — User-Centric Insights and Transaction Type Mapping

  1. User Empathy

At the forefront of our customer experience strategy, under the guidance of co-founder Luke Ladyman, we embarked on a journey to connect with our users. Our goal was to uncover their expectations regarding our PFM. One insight that emerged from these interactions: users expressed a preference for a more nuanced categorization system. Specifically, they highlighted the importance of distinct categories for quick eats, restaurants, and pubs, rather than grouping them under a generic dining category.

The next step was crafting our user-centric approach. Taking the lead from our co-founder and CEO, Tariq Zaid, whose experience includes leading product teams at Shopify, our focus was on making Cheddar’s PFM useful for our users. The objective was clear: to define categories that empower users to make informed financial decisions. For instance, broad categories like financial, general, or shopping were deemed less practical, as they offer limited actionable insights. Instead, we opted for a hierarchical classification approach. This includes segmenting bills into subscriptions (e.g., Netflix, Spotify), utilities (e.g., EDF, Wessex Water), and loans (e.g., credit card payments, Klarna). Similarly, fashion was subdivided into categories like clothing, accessories, jewellery, and shoes. Transportation also underwent detailed categorization, encompassing trains, commuting, fuel, and taxis and cars resulting in a comprehensive structure of 57 categories grouped into 11 clusters.

2. Transaction Type Mapping

During this phase, our engineering team, under the leadership of co-founder Tariq Zaid, spearheaded the development of a transaction parser. This tool was designed to extract essential information — such as amount, date, merchant name, and transaction type — from transactions spanning 21 diverse banks integrated into the Cheddar App. Our engineers collaborated with data scientists to map various transaction types, including direct debits, ATM withdrawals, cheques, bank fees, and card payments, to the predefined spending categories previously established.

Open banking standards prescribe specific bank codes to identify transaction types like direct debits, funds transfers, and card payments. However, in practice, only a few banks adhere strictly to these standards, with most employing proprietary codes or a mix of standard and proprietary codes. Deciphering the meaning behind each code required deep dives into the manuals provided by each bank’s API. In some instances, codes were not clearly defined, so we made the analysis of hundreds of transactions for each code to understand its significance. To tackle this challenge effectively, we organised into two teams, each comprising software engineers and data scientists.

Upon successfully mapping each bank’s proprietary codes and gaining clarity on their meanings — whether they indicated a direct debit, card payment, ATM withdrawal, cheque deposit, or other transaction types — we engaged with the product team, led by Tariq, to determine how best to assign these codes to specific spending categories. For example, transactions categorised by banks as incoming interests were appropriately placed under the “savings” category, with subdivisions for savings accounts and investments. Similarly, transactions labelled by banks as bill payments were categorised under the broader “bills” category, specifically within the “utilities” subcategory.

Coming Next

In the following part of this blog, we will discuss how we focused on data cleaning and mappings. We extracted merchant names for card payments and direct debits, linked these names to common retailers, and defined a mapping of merchant category codes into spend categories.

--

--