Streamlining bank transaction categorisation at scale — Part 4

Mauriciotorob
4 min readAug 20, 2024

--

In parts 1 and 2 of our blog, we explored Cheddar’s efforts to enhance its Personal Finance Manager (PFM) by focusing on user insights, transaction mapping, and data standardisation. Part 3 introduced AI techniques, including numerical embeddings and machine learning models, to further refine transaction categorization, supported by extensive human testing. Now, in part 4, we will explore how these innovations are being applied to categorise transactions in Cheddar’s PFM.

Part 4 — Going to production

9. API Creation

After completing the human review process, our data scientists teamed up with data engineers to create an efficient and robust data workflow. This collaboration led to the development of an Application Programming Interface (API) that encapsulates all the logic for transaction categorization. The API processes batches of transactions and returns their corresponding categories, ensuring seamless and accurate categorization.

The web API was developed using FastAPI, with Pydantic integrated to enforce strict type checking, safeguarding the integrity and consistency of the input data. The mappings of retailers and merchant category codes were stored in memory dictionaries, while the machine learning model was loaded using TensorFlow Hub and also kept in memory as part of the API.

To ensure reliability, the team defined unit tests with various transaction cases, specifying the expected categories or error messages. Additionally, they assessed the API’s performance, ensuring it could categorise 1,000 transactions in under 100 milliseconds. They also specified the required libraries, their versions, and the necessary Python version to guarantee a correct operation.

10. Cloud Deployment

In the final phase, our data engineers streamlined the deployment of our API using GitHub Actions, automating the process. This included building a Docker image that contained all necessary API components, which was then automatically uploaded to the container registry within Google Cloud Platform’s Vertex AI. GitHub then triggered the creation of a model from this image, deploying it as a private endpoint in Vertex AI specifically designed for online predictions. This private endpoint is securely integrated with the backend of the Cheddar App, enabling seamless access to our categorization API.

To ensure monitoring, the data scientists established dashboards focusing on three types of metrics: technical, machine learning, and business metrics. For technical monitoring, they used Google Cloud Platform’s (GCP) tools to create a dashboard that alerts when more than 5 out of 10,000 API responses return codes other than 200 within a 5-minute window — indicating potential errors, whether in the API itself or in the GCP infrastructure. The team also set up a dashboard to monitor API scalability by tracking instances where more than one instance is required to handle requests, a scenario that should be rare given the current user base.

For machine learning metrics, the scientists developed a dashboard to continuously assess the weighted precision and accuracy of the ML model by comparing API results against known transaction categories, and the source of the category labels (whether from the ML model or predefined mappings).

Lastly, in collaboration with the product team, they defined key business metrics, such as the distribution of transactions across categories and the proportion of total spending per category. A dedicated dashboard tracks these metrics over time, with alerts set to notify the team if any proportions deviate from expected norms.

Conclusions

Cheddar has aimed to revolutionise banking with user-friendly financial solutions that simplify payments and democratise rewards. Our cashback program has not only earned high praise from users but also contributed to our recent accolades, including the prestigious ‘Best Newcomer’ award in 2024. We are dedicated to further enhancing the banking experience by making it more accessible and rewarding for everyone. Exciting plans are underway, including the recent launch of a versatile Personal Finance Manager (PFM) designed to empower users across various bank accounts with intuitive financial management tools.

At the heart of our PFM lies the implementation of transaction categorization. This process, detailed across four parts in our journey, highlights our use of technologies like machine learning, NLP, and AI to accurately classify transactions. From understanding user needs and refining spending categories to integrating AI for enhanced categorization, each phase has been pivotal in ensuring a seamless financial management experience for our users. The culmination of this effort includes the development of an API-driven solution deployed on cloud infrastructure. As we continue to expand and refine our offerings, Cheddar remains in our mission to redefine banking, making it more transparent and accessible for all.

Future steps

As we continue developing the PFM feature for the Cheddar app, our primary focus is on enhancing the user experience by grouping transactions into categories that are both intuitive and insightful. We’re also working on introducing personalised suggestions tailored to each user’s spending patterns, aiming to offer recommendations that help users save more effectively.

On the technical front, we’re exploring the possibility of training a character-based large language model from scratch. This model is designed to comprehend the unique language of bank transactions — a specialised language filled with proper nouns like PayPal, eBay, and Amazon; common words like store, supermarket, and bank; abbreviated terms; and a variety of symbols, such as asterisks, parentheses, dashes, and digits of varying lengths. Additionally, the model should handle the intricacies of spaces and abbreviations within merchant names, which can significantly alter the syntax while preserving the same meaning. For instance, variations like “PayPal * sainsburyssupermar,” “Sainsbury’s Supermarket,” and “PAYPAL SAINSBURY 6676” may look different but refer to the same retailer. Our goal is to ensure the model can accurately interpret these nuances, leading to more precise transaction categorization and a better user experience.

Embark on the journey

Are you intrigued by the realm of Bank Transaction Categorization? Are you eager to define categories and craft a machine learning model that predicts them? Perhaps you are keen on exploring various AI language models to identify the perfect fit for analysing bank transactions? Your journey may begin by diving into an open dataset of bank transactions, accessible on Kaggle at https://www.kaggle.com/datasets/apoorvwatsky/bank-transaction-data. Happy exploring!

--

--

Mauriciotorob

Remote Staff Data Scientist @ Best Newcomer in the British Bank Awards 2024 | Speaker @ AI in financial services conferences in London