Enabling Autonomous Decisions in Santa’s Workshop: Data Mesh at Avanza Bank

Published in

Avanza Tech

8 min readDec 21, 2023

How to does Santa know that he succeeds in making the children of the world happy? This is the story of Avanza Bank’s journey from a centralised data and analytics architecture to a distributed domain driven data mesh.

Santa’s workshop

We’ll start with a familiar place. We all know of it. This is Santa’s workshop. Where all the Christmas presents are made.

Santa’s goal is clear but complicated: make every child happy with gifts. In the workshop, teams of gnomes and elves create various toys. Let’s call them “teams.” Some handle base components, others assemble toys, and some craft exclusive combinations, each excelling in specific areas.

The teams create, create and then they create som more. Then Santa gets to go and deliver the gifts to all the children. When he’s done he can relax knowing that he has done his deed. He has given the children happiness. Or has he? How does he know this? Because he measures it!

Santa receives letters from children, sharing their joy or disappointment. He gauges their reactions and may even have ways to listen or receive feedback through toy mechanisms. Santa then meticulously turns this feedback into insights for the next year — millions of children, toys, and products. This seems to be quite a lot for one Santa, doesn’t it? And he doesn’t even know how the toys work, he is just Santa.

This setup sounds kind of familiar, doesn’t it? And that’s because this has been the case in basically every centralised data and analytics department of larger organisations. And this was also how it was att Avanza not so long ago.

About Avanza Bank

Avanza is an online bank with 1.8 million customers in Sweden and a portfolio of products ranging from stock trading, bonds, mortgages and other banking services. We believe in creating a better future for millions of people by helping them save better. We do this through tech. It is a company where tech and business come together to create fantastic products for our customers.

Data and analytics at Avanza Bank historically

Avanza has a development organisation is built on autonomy. Teams handling customer-facing products fully own their domains, encompassing ideation, development, testing, legislations, and analytics. The microservice-based architecture relies on in-memory databases and swift direct communication for rapid response times crucial in banking, where milliseconds matter. While effective for applications, this architecture presents challenges for data and analytics, leading to data silos and accessibility issues.

Teams of Avanza working in autonomous domains — Avanza Domain Teams

So how did we gain insights before? Like many companies, we utilized a monolithic data and analytics system, primarily composed of a SQL Server and a QlikView application for visualization.

Avanza’s legacy data architecture which is built upon a central SQL Server and qlikview. Also built upon lots of scripts and hacks to get data which the data team built. — Centralised data and analytics

Without a structured data retrieval method, data engineers, who were separate from development teams, had to resort to various one-offs to extract data. Subsequently, analysts, lacking insight into data quality, struggled to verify assumptions. The disconnect between application-owning teams and data engineers led to confusion. Adding to the problem was Google Analytics data, which couldn’t seamlessly integrate with business intelligence data.

A technical leap was needed!

The Technical Leap

The technical leap was achieved in three streps:

Introduction of Kafka: All analytics-bound data was directed through Kafka. This facilitated both a unified channel for data and an event-driven approach in microservices.
Introduction of Google BigQuery: BigQuery was put in as the work horse and storage. A custom application was developed for ingestion and pseudonymization of direct personal identifiers, ensuring customer data integrity. With data easily accessible, we were poised for the next step.
Adoption of dbt (data build tool) for data transformation: Controlling the data transformation process became crucial. The decision to incorporate dbt as our transformation framework addressed this need.

By integrating Looker for business intelligence and Jupyter notebooks for exploration, we've assembled the components of a modern data stack.

Operational applications communicating with Kafka for data and event driven approaches. Ingestion with a custom app whichdoes pseudonymisation. Storage and transformation with dbt and BigQuery. Visualisation with Jupyter + python and Looker. — Avanza Data Tech Stack

With a modern stack in place we had quite a capable platform. As more and more data flowed through Kafka to BigQuery the opportunities for insights and data driven applications increased. But there was still an itch.

Let’s look at Santa’s workshop again.

Is this situation really solved? Sure, we’ve given Santa a super computer. Maybe we’ve even given him some cool gadgets to make him keep track of what he is doing. But… there are still problems.

Santa lacks knowledge about specific toy categories — unable to distinguish a dumper from a truck or a plush bear from a teddy panda. In simple terms, he lacks domain expertise.
He is still just one person. His time is severely limited.

We should be able to solve this! Maybe we can have the teams of elves access to their own data? Maybe we can let them explore and generate their own insights and aggregations? Maybe we can let them share their understanding with the rest of the organisation?

This is in essence the concepts of a data mesh.

Data mesh at Avanza

Data mesh is an architectural pattern for distributed data by Zhamak Dehgani from 2019 and further developed through the Book Data Mesh: Delivering Data-Driven Value at Scale in 2022.

Data Mesh is based upon 4 principles, which can be read about here:

Domain-Driven Data ownership Architecture
What this means is essentially that domain teams have higher understanding of their domain and thereby their data than anyone else. Because of this each team is responsible for the end-to-end data lifecycle within their domain.
Selfe-Serve Infrastructure as a platform
To enable the domain teams to be efficient a selfe serve platform must exist which enables the teams to get to work quickly with standardised tools.
Federated Computational Governance
This means that each team is responsible for governing its own data products. But they do so by standards and automation to ensure interoperability throughout the platform.
Data as a product
This means that data products are designed, built, and operated with a product mindset. Teams are accountable for the success of their data products, their usage and their lifecycle

For anyone working in application development this might sound like old news as it is akin to the concept of domain driven design and micro services. But in the data world… this is a game changer.

As Avanza’s teams were already autonomous in their domain, we took inspiration from the thoughts in data mesh and came up with our first implementation of a data product architecture. There might be more future implementations but this is our initial approach to the concept.

The implementation is a packaging of source code and tools which are needed to create valuable data. In our case packaging of dbt, BigQuery, access management and importantly: code which delivers metadata on published outputs to a component we call “data registry”. This information then be used to automate komplex flows and also enables monitoring. The development teams can bootstrap their own products and get to work with modelling their domains in a matter of minutes.

A data product at Avanza which takes some inputs, transforms them and pulishes outputs. — Technical implementation of a dataproduct at Avanza

From an overview it looks like this.

How the data products are coupled from output to input and how they publish their outputs to data registry — Data products in a data mesh

Raw data is being put into a landing dataset from Kafka.
The data products then take over and prune, clean and refine the data and publish their result to the “data registry”.
The outputs are then used for various use cases.

With this approach, our development teams create value for themselves and others. At Avanza, we’ve witnessed an organic growth of data products with use cases spanning reporting, KPI measurements, machine learning, and personalization. Crucially, teams operate independently without relying on dedicated data engineering roles or a center of excellence. This autonomy is what we aim to foster, empowering teams to make quicker data-driven decisions on their own.

While we have enabled our teams to work with data more independently, there are still some things we are working to perfect.

With a data mesh approach it is hard to know what team that should be responsible for central KPIs and analytics.
Data governance standards needs constant work and attention as there is no central authority working with the data and more people have access to the data.
Orchestration of pipelines needs to be automatic and efficient. It’s working for now with scheduled pipelines. But as the mesh grows, so will the complexity and simple scheduling simply won’t be enough.

Conclusion

So then. Should you just adopt this approach? Maybe! It is not a fit all solutions as it is fairly complex. Simply put, it might not be the goto solution for all companies. Here are some prerequisites to make it work:

Your teams are autonomous for real and have clear domains. You have a strategy where you trust your teams to deliver in their domain.
The timing is right. It is a large cultural and technical shift. Make sure that the organisation is ready for such a shift.
You have a technical platform which makes it easy.
Your company is of adequate size. It doesn’t make sense to implement data mesh for a small startup with just a few people working in data.

Given these circumstances it really can be a route to a more data driven company. And if you do adopt it, remember to bet on the knowledge and capabilities of your teams. Trust that they know their domain, trust that they can innovate on their own and trust that they have the capacity to find value from your data. This is what we do at Avanza.