Streamlining bank transaction categorisation at scale — Part 3

3 min readAug 12, 2024

In parts 1 and 2 of our blog, we explored Cheddar’s journey in enhancing its Personal Finance Manager (PFM) by focusing on user-centric insights and transaction type mapping, followed by data cleaning and mapping processes. We detailed how we extracted and standardised merchant names and linked them to common retailers, ensuring accurate transaction categorisation. Coming up in part 3, we will dive into incorporating discriminative Artificial Intelligence (AI) to further refine transaction categorisation, including the use of numerical embeddings, developing a machine learning model, and conducting human testing for validation.

Part 3 — Adding AI to make it sing

6. Numerical Embedding based on Discriminative AI

In cases where transactions lacked a Merchant Category Code (MCC) and involved smaller businesses — like “convenience store John Street” or “SumUp * Taxi Driver XX YY” — our data scientists came up with a creative solution using AI and large language models (LLMs). They used these models to transform merchant names into numerical vectors with fixed dimensions, usually between 300 and 1000. The goal was to map similar merchants close to each other in this vector space, so related categories like restaurants or transport services would cluster together, while keeping distinct categories apart.

After testing different models like BERT Sentence and Google Universal Sentence Embedding (USE), the team found that USE was the best fit for representing diverse merchant names. This choice will significantly improve our ability to categorise transactions involving smaller merchants. To select the optimal numerical representation, the team evaluated several factors. They started by sampling merchant names and converting them into high-dimensional vectors. Using t-SNE, they reduced these vectors to two dimensions and created scatter plots to visually assess how well the embeddings grouped similar merchants. They also considered the performance of the models, noting that USE was notably faster than Sentence BERT. Despite USE not being open source, its good performance and effective clustering of merchant names without tuning made it the preferred choice.

7. Machine Learning Model

While the numerical embedding captures certain semantics of the merchant name, it alone falls short of providing a mapping of transactions to categories. To address this, we developed a machine learning model to predict transaction categories. Our approach involved leveraging existing transaction features like date, amount, and bank, while also introducing new features such as the day of the week, the count of numerical digits within the merchant name, and the maximum numerical value present in the merchant name. Crucially, we integrated the numerical embedding of the merchant name as a key feature.

Training both a random forest and an Xgboost model on this augmented dataset, we observed that Xgboost demonstrated superior performance for this task. Our optimization metric focused on the weighted average F1 score, assessing the model’s proficiency in classifying transactions across all categories. Importantly, this metric assigns greater significance to high-transaction-volume categories like supermarkets, while appropriately weighing down categories with fewer transactions, such as reading.

8. Human Testing

In a collaborative effort involving all company employees, the data scientists distributed a subset of transactions — comprising a few thousand instances — to each team member, along with assigned spending categories. Each transaction in the human testing sample was independently assessed by three different individuals. Each participant was tasked with determining whether the assigned category was accurate, acceptable but improvable, or definitively incorrect. Following individual assessments, a collective discussion ensued, allowing for the identification of common concerns, suggestions, and areas for improvement in the categorization process. This human-centric testing proved invaluable in refining our categorization system, ensuring a more accurate and user-friendly experience for our platform users.

Coming Next

In the final part of this blog, we’ll dive into the exciting final steps of our project. We’ll explore how our data scientists and engineers collaborated to create a robust API for transaction categorization and discover how they used FastAPI and Pydantic to ensure efficient and reliable data processing. We’ll also detail how we automated the deployment of this API using GitHub, Docker, and Google Cloud Platform’s Vertex AI, culminating in a seamless integration with the Cheddar App. Stay tuned to see how these technologies come together to enhance our platform’s capabilities!

Streamlining bank transaction categorisation at scale — Part 3

Part 3 — Adding AI to make it sing

Coming Next

Written by Mauriciotorob