Decision engine, part II: Cluster as a source of truth

Michał Lebida
AzimoLabs
Published in
4 min readAug 9, 2018

“Building a decision engine without a data source would be like opening a coffee shop without any coffee.”

In my last article, I wrote about how we built a decision engine to automatically reward Azimo’s customers depending on certain criteria.

A simple example might be to give someone a voucher on their birthday. More complex examples might take into account how many times a user has logged in, how many transfers they have made using Azimo, what kind of payment method they used etc.

In this article I’m going to outline how we built a fast and reliable solution to handle the data that powers our decision engine, and how Akka helps us to achieve this.

Data architecture

Building a decision engine without a data source would be like opening a coffee shop without any coffee. A fancy coffee machine is great, but without something to put in it, you haven’t got a product.

Our decision engine is based on two data layers. One is for real-time data, so that we have all the up-to-date information in the system. The other is for historical data, so that we can check past events for anything that might be important.

This approach is called Lambda architecture. Lambda architecture has two paths. One is called a batch layer, where we keep all the data in one place. The second layer, where the real-time data lives, is called a speed layer.

Lamba architecture

Keeping data where it belongs

Event sourcing

Lambda architecture handles the data flow, but we also need a way to consume, save and share the data. We chose to use event sourcing.

What is event sourcing?

In the traditional model of recording object states, we save only the last version of the state, meaning that all prior states are not available for analysis. In event sourcing, we write all events related to the given object into an event store. We can then use the event store to find out who changed what and when it happened. This gives us far more data to draw on, but results in larger logs. The larger the log, the more time it takes us to query it, slowing down the system overall. That’s why we create snapshots of the log at regular intervals. If we only need data from, say, Tuesday, we can simply query the Tuesday snapshot rather than the entire data set.

Implementation

So we now have a model for handling the data flow and a model for storing and managing the data. Next we need to decide on how to code the project. We code a lot of projects in Scala using the toolkit Akka, so it made sense to use the same stack 😎.

We use akka streams for data streaming, with BigQuery as the batch layer and Kafka to serve the speed layer. For object persistence (in our case each entity is an Azimo customers) we use akka-persistence. For reliably connecting both, we use akka-cluster.

How does it work

The backbone of the implementation is a cluster which we divided into three parts, or in Akka nomenclature ‘node roles’:

  • Consumer — this node role is responsible for handling incoming events from both the speed and batch layers. Each incoming message is then pushed to an actor in domain.
  • Domain — region in which each actor corresponds to a single user in our system. Each of those actors has event storage (as per event sourcing) thanks to the Akka persistence module. We also use cluster sharding to scale more easily. Cluster sharding groups actors in a consistent fashion. This allows us to find an actor quickly, and to access that user’s event history.
  • API — this serves the data through HTTP endpoints.

This approach has some pros and cons.

Pros:

  • Separation of concerns thanks to cluster roles.
  • Fault tolerant.
  • Load balancing thanks to sharding.
  • Easy scaling.
  • Easy to implement new features/models after you understand the basics.
  • Happy reporting, with events history stored per actor in a reliable datastore.

Cons:

  • Eventual consistent.
  • High barrier to entry for new developers.
  • Hard to test and debug things comprehensively.

Final words

This article is a quick overview of how we’ve built infrastructure that can quickly and easily check our user data. Azimo has thousands of data points about its users, which can be combined into thousands of potential permutations. If you have any questions about the system or its potential uses, feel free to ask in the comments below 🤓.

Towards financial services available to all

We’re working throughout the company to create faster, cheaper, and more available financial services all over the world, and here are some of the techniques that we’re utilizing. There’s still a long way ahead of us, and if you’d like to be part of that journey, check out our careers page.

--

--