typebus: A modern guide for building distributed systems with scala and akka

PART 1

Building distributed systems is hard. Both the practice of high level design and the tradeoffs during execution of that design are elusive to many engineers. Often what appears to be a microservice design on paper deploys as a distributed monolith in practice. It is therefore critical to understand the design process and how it will map to your microservice architecture along with the tradeoffs that must occur.

Distributed systems are about tradeoffs. Making decisions about how your system will scale/recover from failure involves tuning settings for data consistency and availability. This document will demonstrate that you can tip things into your favour. By constructing a system that is highly compartmentalized we can tune these conditions at a very fine level.

I have divided this document into multiple parts. In the first section I will discuss the types of tradeoffs and some of the ways that typebus will helps to address these. Additionally I will talk about Domain Driven Design principals and how these map to tyepbus microservices. The next document will dive into tyepbus code and walk through some examples of what is provided. Following that I will talk about exposing gateway services for your public API. Last I want to dive into event sourcing and CQRS.

Throughout the document we use code examples in Scala, Akka and my framework called typebus. However the principles could be abstracted to any system that uses an Actor pattern. If you’re not familiar with the Actor pattern, you can get more information on Actors and using Akka, by consulting the documentation here.

About Tradeoffs

Complexity and Scalability

Microservice design often introduces complexity. This complexity comes in the form of more services to manage, deploy and maintain. Because service design promotes loose coupling, dependencies and message flow can be difficult to see and reason about.

The benefits however include high scalability, failure isolation, team autonomy and rapid iteration. typebus in turn provides further tooling to visualize and debug message passing, relieving some of the complexity around management and debugging of your service architecture.

Consistency and Availability

The infamous CAP theorem tells us that we have to give up Consistency in order to gain Availability and visa versa. Any system that has parts separated by space and time will always have a consistency problem. We therefore are interested in establishing a few types of consistency definitions:

Strong Consistency: All members of a system must agree on the state, before it becomes available. We create strong consistency in a system by introducing contention, which in turn limits availability.

Note: Traditional monoliths are based around strong consistency.

Eventual Consistency: After a system stops receiving updates, you can guarantee that all parts of the system will eventually converge on the same state.

With microservice design and the help of akka toolkit we can be very clear about where and for what resources we want to make these tradeoffs. For example typebus steers us into a common design pattern for making our write entities strongly consistent by way of sharding, while allowing for our reads to be eventually consistent

Message Patterns

There are a lot of differing opinions about message patterns in distributed systems. This is where I see a lot of mistakes being made. It is extremely tempting to put your Data Transfer Objects into a shared library that is common to all of your microservices. This breaks the principle of isolation and starts you down the road to what is commonly known as the distributed monolith.

First and foremost: Messages in your system need to be non-blocking and asynchronous. If you have blocking message patterns in your system it is the first source of major resource contention that you are having and needs to be removed. However removing this does not imply that you can now scale, nor does it imply you have a reactive design, despite it being a requirement It is only one such requirement needed.

Delivery Guarantees:

There are 3 types of delivery guarantee that we will pause to define at this level:

At Most Once

  • In at most once if we fail a message we lose that message entirely
  • Does not require any storage of that message

At least Once

  • Will retry message until it succeeds
  • Requires message storage at the sender

At Least Once Delivery with Deduplication:

  • Exactly once is impossible but we can simulate with at least once with Deduplication
  • Message storage on sender and receiver

Note: that you sometimes hear the term Exactly Once, which is an impossible guarantee is a distributed system. See The Two Generals Problem. We can however “simulate” this with At Least Once Delivery with Deduplication

Note: that Akka message passing uses At Most Once. You can achieve the other levels of semantics at the application level (ie: passing ack messages and adding retry logic)

Message passing is achieved in Akka through a point 2 point or a publish / subscribe (bus) model. typebus as you might have guessed, forces us to use the latter. With a bus, services are coupled to the message format and not to each other. This allows for greater isolation, autonomy and superior fast data design options.

typebus

typebus is my mildly opinionated toolkit for building reactive microservices, abstracting away a publish/subscribe bus implementation. It provides a consistent way to create and advertise service definitions. These services in turn own their API and provide guarantees about how their API may evolve. If we think of the real world as a distributed system, then typebus is a set of stories or beliefs that each person holds so that we can highly organize our society (eg: the story of money). typebus aims to provide solutions to:

  • Faster to develop: providing templates that map to Domain Driven Design (DDD)
  • Better isolation and API ownership
  • Simpler gateway service design and public API implementations
  • Guarantees around schema evolution
  • Flexible deployment and scale options
  • Open and composable communication channel.
  • Tooling for support, visualization and debugging services

I will begin with the high level design process and how this maps to microservice boundaries. Additionally I will introduce how typebus can auto generate your service from a template. Once you have a demo service up and running we can talk about the communication and tooling that typebus enhances. Finally I will take a look at some design patterns and tradeoffs for building services.

Domain Driven Design

Domain Definition: A sphere of knowledge.

The process to this style of design dictates that we first locate the “domain experts”. These are the people that are responsible for the understanding and the operation of the business domains. For example if you are working on a FinTech backend our domain experts would be:

  • Loan specialists
  • Bankers
  • KYC individuals
  • Customer experience representatives
  • Accountants
  • Etc.

Once we have these people in a room we want to start to agree on terms used to describe items in each domain. It is important to note that these terms should not be software terms, instead they are the terms used in that business domain. This terminology will be used to define a “ubiquitous language” for that domain. Almost certainly there will be terms that span multiple meanings between departments or organizations in the domain. Make special note of these terms as the will aid in identifying “sub domains”. As an example the term “Account” might have very different meaning between the Banker and the Customer experience representative.

Mapping our domain

One of the most fun ways to begin mapping our domain is through a process known as “event storming”.

Events Storming

The main goal of an event storming session is to capture all the domain activities. Domain activities are anything of relevance to the domain.

Types of activities that occur in a domain:

  • Commands: requests to perform an action. Cause changes to state of domain. Eg: Add Item, Pay Bill
  • Events: Actions that have happened in the past. They record a change in state to the domain. Eg: Item was added, Bill was paid
  • Query: Requests for information about a domain. Should not alter the state. Get orders, check if bill paid

Further reading on event storming: https://www.eventstorming.com/resources/

In a FinTech example this could be items like:

Check Available Balance in Account

Get Credit Score for User

Issue Loan Based on Credit

Once we have a list of activities that occur in the domain the next step is to is to define the “commands” or “triggers” that cause these events. These triggers should include the source of the event: User, Scheduler, Time, etc.

Next we want to decomposing our domain into multiple subdomains.

Subdomain

Once we have our domain events and triggers we should start to see some common patterns. Grouping events based on these patterns should start to expose “subdomains”. Here are some tools to help with subdomain discovery:

  • Terms that had multiple meanings based on which domain expert used them should NOT be in the same subdomain.
  • User Account vs. Bank Account
  • We can often look at the business structure (departments) to aid in subdomain discovery: Loan Department, Bank Accounts Department.
  • Consider causality. If one event causes another, perhaps those events are rooted together.
  • Abstracting away the Noun and grouping based on verb can sometimes help. Eg: Send User Email Notifying them of Loan Approval, Send User Sms alter telling them they have 2 days to make minimum payment. Send push notification to app displaying rewarded points for linking account. “Send” could imply that there is some sort of a “Notification” service.

Domain Objects

Once you are happy with your subdomains you can then dive in to each of these to identify domain objects. These objects come in 3 flavours, and generally form a hierarchy of objects for the domain.

Value Objects

These are objects that can be identified by their attributes. An example of this is an Address. If an address contains all the same attributes, it is in fact the same address.

Entity Objects

These objects require a unique key to identify them. For example a Bank Account might contain a balance. If 2 balances are the same it does not imply that they are the same account, we therefore have an Account Number or Id that uniquely identifies this object.

Aggregate Roots

These are Root Entity Objects, and normally will end up being a dominating term from your ubiquitous language. Aggregate roots are what will receive commands to act on your subdomain. For example you may have found that you have a subdomain for your “Loans”. The aggregate for this subdomain could potentially be each individual Loan. A loan may consist of multiple ledgers. Some of the actions that might be performed could adjust a ledger.

  • Add 1.10 to interest ledger
  • Subtract 105.36 from principal ledger

Another option might have been to group all of the Loans together based on the user. In this case you might have a “UserLoans” aggregate root. This in turn might have multiple loans with multiple ledgers.

It should also be noted that it is fine for a single subdomain to possess more than one aggregate root. The granularity you choose here will translate into your service complexity vs. granularity to tune at different roots. As you will see typebus helps you to manage some of the complexity thus steering you toward choosing higher granularity.

Bounded Context

The subdomain with aggregate roots, and ubiquitous language together form a bounded context. Commands, Events, and Queries make up the messages of our system. They form the API of our bounded context. Each Bounded Context will map to one or more microservices.

TODO Loan Bounded Context

From Design to Code

The first thing I provide is a template for mapping your domain to a typebus microservice. This template comes in the form of giter8. Make sure you have the supported g8 tool installed and then you can run the following (setup)

> g8 git@github.com:coreyauger/typebus-service.git
name [Awesome Service]: Loan Service

Enter the name of your service. For the above we provide the name for our example “Loan Service”

organization [io.surfkit]: com.myorg

Next enter the organization namespace that you want to use. Typically the organization reverse url package name.

namespace [loanservice]:

Enter the namespace for the project. In this case the default looks fine.

squbs_version [0.11.0]

Enter the squbs version. We will talk about squbs in another post. In this case we are using the latest version and that is what we want.

route_support [yes/NO]: No

Do we need akka http router support. This could be to support a REST api or perhaps connect a websocket ect. In this case we don’t care about this support so we say No.

cluster_sharding [YES/no]: Yes

We do want cluster sharding support. This will allow us to define our Aggregate Root and achevice strong consistency right out of the gate for this entity.

cluster_actor [User Actor]: User Loans Actor

This is asking us for our main Aggregate Root that will become the sharded actor. In this case we choose our User Loans. We are going to send each message that has to do with a Users Loan to their UserLoansActor. Again this will allow for strong consistency while still allowing us to scale.

bus_type [kafka]:

Next we choose the bus type. At this time only kafka is supported, but in the future there could be additional types: kinesis, rabbitMQ, etc.

persistence [yes/NO]: Yes

Last we choose if we want to make the actor persistent. If we want to design in a CQRS manor that is described below then we choose yes here.

Template applied in ./loan-service

What did this create?

The typebus service template has generated a service template that has some example API methods defined. This example is for a fake Book Store. It also has a data section for defining your API and DTO (data transfer objects). All of this is seamlessly wired into the bus choice that you provided (kafka in our case). Additionally your main Aggregation Root (UserLoans) in our example has been defined as a Sharded and Persistent Actor. This will provide us with strong consistency right out of the gate.

Additionally all of the setup for joining seed nodes and cluster configuration is done for you. The only thing you need to provide is the location of your ZooKeeper node. The config has made some sensible choices for you for configuration options that you are free to override. One choice that was made is that persistence will be backed by Cassandra database (you will need to have Casandra). In the future I may move this to be an option of the template as well.

Finally typbus even includes a docker recipe that will allow us to compile to a docker image. The toy example service could literally be compiled and placed into a production environment running: kubernetes, mesos, dc/os, fargate, etc.

Whats Next?

To gain a better understanding of how to work with typebus, I will dive deeper into each section of the code. The next post will explore the code gen in detail and allow you to get a feel for working with typebus.

Next: Anatomy of a typebus Service

References

Lightbend Reactive Architecture: https://www.lightbend.com/learn/lightbend-reactive-architecture