Event Sourcing: The best way for StashAway?

Published in

StashAway Engineering

10 min readMar 2, 2018

Introduction

FinTech.

For better or worse, this is probably one of the more popular buzzwords of this decade. A shorthand for some form of digital revolution, poised to sweep away antiquated banking practices. Or so, that is what the mainstream media told me. But what they also conveniently left out about FinTech startups is the Catch-22 at the heart of its popularity.

Trust is the key currency for all financial institutions. Lose trust, and you will lose customers faster than you can cope. Earn trust, and you will gain customers faster than you can cope as well.

In the StashAway backend team we face the challenge of designing a sufficiently rigorous trading system that could somehow face the rigours of the most stringent of financial audits, as well as scale to meet our growing customer base without sacrificing performance.

CRUD

The conventional way of designing a backend system usually involves 4 standard operations: Create, Read, Update, Delete. A model will typically have methods that embody these operations. For example, let us create a very simple model of a bank:

class Bank {  var balance = 0
  
  def getBalance() = {
    this.balance
  }
  
  def deposit(amount: BigDecimal) = {
    this.balance += amount
  }
  
  def withdraw(amount: BigDecimal) = {
    this.balance -= amount
  }}

In this case, we have a Create case, where the amount is initialised as 0. A Read case where we get the current balance. And Update cases, where we can deposit and withdraw money respectively. Pretty decent.

And of course, you have the database to persist whatever output is produced by the model. This is usually done by a mapping between the model and the relational database. But for now, we can just observe the possible queries that our model will call.

CREATE TABLE bank (amount bigdecimal)SELECT amount FROM bankUPDATE bank SET amount = ?

Pretty straightforward.

When a user deposits $200 into the bank, the model then updates its state balance to $200. This change is persisted into the database, where a single value of $200 is stored.

If the same user then withdraws $50 from the bank, the model then updates its state balance to $150. This change is persisted into the database, where the value of $200 is updated to $150.

We have gotten the end result that we wanted; namely that the customer is able to update his/her balance.

However, on the technology side of things, the ends may not justify the means. We have built a system where there are 2 places storing the balance of $150. The model and the database. This poses its own problem, namely that of object relational impedance mismatch. But the big game changer is: how do we derive the transaction history of the customer, if all we are doing is storing, and updating the current state? There are countless of permutations which will result in a customer having a bank account balance of $150, and there is no way anyone is able to derive anything from just the current state. And in the financial industry it is absolutely imperative that we are held accountable to track each transaction being made.

So, we may probably add an audit log. Whenever the CRUD methods are called, it will append the actions taken to a log file. For simplicity’s sake, let us assume that the output is a CSV file.

ACTION,AMOUNT,BALANCE
DEPOSIT,200,200
WITHDRAW,50,150

There. But now, there are at least 3 places where we store the balance now (one extra place in the audit log). There are so many moving parts and potential points of failure, all for a simple Bank object.

Event Sourcing

Perhaps a redesign may be necessary. So let us review what we have so far. We know that the amount starts off as 0. We know that a user can take two actions: Depositing and withdrawing.

The more astute amongst you would have noticed that by setting the initial state as 0, and replaying the audit log, we can basically recreate the model which we are currently reading from. So, why not just have everything be driven from the audit log?

So this is where event sourcing comes in. The fundamental idea behind event sourcing is to ensure that every change to a state is captured in an event object. These event objects are in turn stored in the sequence they were applied in for as long as we need the state.

Tech Stack

For StashAway, we have decided to adopt Lightbend’s Lagom framework because it abstracts the mechanics of event sourcing sufficiently for the developers to just focus on the domain logic.

So our tech stack is as follows:

sbt build tool
Play Framework
Akka (for clustering, streams, persistence)
Kafka
Cassandra (our distributed database for persistence)
Scala
Guice
Jackson

Event Sourcing flow

The flow of logic from request to persistence depends on 4 things: Entity, Command, Event, State

Command: Think of it like a request from a user. “Would you please deposit 200 dollars” “Would you please withdraw 50 dollars?” How the program handles the command, and whether the command leads to an actual change, is handled by programmatic logic. This will be made clearer in the sample code.

Event: Think of this like a fact that has already happened. “A deposit event has occured with amount: $200”. “A withdrawal event has occured with amount: $50.” That means at one point of time, a $200 deposit was received, followed by a $50 withdrawal. We can replay these events from the initial state, to obtain the state that we want.

Each event is immutable and stored in an events journal. Meaning, that all we will be doing is appending events. There will be no mutating or deleting of past events, just appending of new events.

For example, in the case of a deposit reversal of $100. Instead of deleting the $100 deposit event, or changing the amount to $0, we will have a separate event that may be named DepositReversal, which when emitted will reverse the deposit. Accountants will be familiar with the concept of correcting entries.

State: This keeps track of what has happened. Events will influence state. For example, you will keep track of the balance. When a deposit event occurs, you increase the balance you have. If I play 100 add deposit events of $1 each, you will have a state that increases with $1 with each event replayed, until you reach the last state of $100.

To summarise, this is the generic logic flow in the event sourcing framework:

Command → Event → State

Command: “Would you deposit $200” → Event: “Deposit Event has occured of amount $200” → State: You have a balance of $200

Command: “Would you withdraw $300” → Exception: “NegativeBalanceException: You are unable to have a negative balance” → State: You have a balance of $200

Command: “Would you withdraw $50” → Event: “Withdrawal Event has occured of amount $50” → State: You have a balance of $150

To tie these 3 together, we need the concept of an Entity:

An entity (identified by a unique id) has a state, as well as command and event handlers. It is basically an abstract construct of a business entity, that implements the sequential flow above to achieve a business aim. It will listen to commands, interpret them, fire off relevant events, and change its state accordingly. Should the entity die a very very horrible death, we will then create a new entity, replay the events, and it will be the same as the old entity.

Sample Code

For this, we will be redrawing the flow from before where a user is able to deposit and withdraw from a bank.

Command

This is where we create all commands that can be used by the Bank model. In this case, we are adding the deposit and withdrawal commands which has an amount parameter, which is usually filled in by a request call.

sealed trait BankCommandcase class SubmitDeposit(amount: BigDecimal) extends PersistentEntity.ReplyType[Done] with BankCommandcase class SubmitWithdrawal(amount: BigDecimal) extends PersistentEntity.ReplyType[Done] with BankCommand

Event

This is where we create all events that can be used by the Bank model. In this case, we are adding the deposit and withdrawal events, that will take in both the amount, as well as a timestamp for tracking and audit purposes.

These are also the events that we will store and replay, as these events are the ones that drive the changes in the state.

sealed trait BankEventcase class DepositReceived(amount: BigDecimal, timestamp: Instant = Instant.now()) extends BankEventcase class WithdrawalReceived(amount: BigDecimal, timestamp: Instant = Instant.now()) extends BankEvent

State

This is where we keep track of the state of the bank. We have an initiate state where the balance can be set to 0. There are also helper functions for us to update this state balance.

case class BankState(balance: BigDecimal) {def init() = {
    balance = 0
}def receiveDeposit(amount: BigDecimal): BankState = {
    BankState(amount = balance + amount)
}def receiveWithdrawal(amount: BigDecimal): BankState = {
    if (balance - amount < 0){
        throw new NegativeBalanceException("Unable to process withdrawal")
    } else {
        BankState(amount = balance - amount)
    }
}

Entity

This is where everything ties together. We have a BankEntity that takes in the various commands, events, and state that we have defined previously.

Lagom has also given us an out of the box method which initialises the state of the entity at the very start, as well as a behaviour method which dictates how the entity will react differently to the same commands and events, based on the current state. For now, we will just handle everything using the default method handleActive.

In the handleActive method, as soon as a command is received, a corresponding event is emitted. This event will subsequently change the state.

In the case of SubmitDeposit command being received, the event DepositReceived is created and persisted with the amount passing from the command to the event.

On the event DepositReceived being persisted, state.receiveDeposit will be called, which updates the balance of BankState with the amount passed in from the event.

class BankEntity extends PersistentEntity {    override type Command = BankCommand
    override type Event = BankEvent
    override type State = BankState    override def initialState: BankState = {
        BankState.init
    }
    
    override def behavior: Behavior = {
        handleActive
    }    private def handleActive: Actions = {
        Actions().onCommand[SubmitDeposit, Done] {
            case (SubmitDeposit(amount), ctx, state) => 
                ctx.thenPersist(
                    DepositReceived(amount), 
                    (evt: DepositReceived) => ctx.reply(Done)
                )
        }.onCommand[SubmitWithdrawal, Done] {
            case (SubmitWithdrawal(amount), ctx, state) => 
                ctx.thenPersist(
                    WithdrawalReceived(amount), 
                    (evt: WithdrawalReceived) => ctx.reply(Done)
                )
        }.onEvent{
            case (evt: DepositReceived, state) => 
                state.receiveDeposit(evt.amount)
            case (evt: WithdrawalReceived, state) => 
                state.receiveWithdrawal(evt.amount)
        }
    }
}

Service

For us to access the above entity, we can build a simple service layer for the users to submit their inputs.

trait BankService extends Service {
    override def descriptor(): Descriptor = {
        named("bank").withCalls(
           restCall(Method.POST, 
                    "/api/bank/:bankId/deposit/:amount",
                    submitDeposit _),
           restCall(Method.POST, 
                    "/api/bank/:bankId/withdrawal/:amount",
                    submitWithdrawal _)
        )    def submitDeposit(bankId: UUID, amount: BigDecimal): 
        ServiceCall[NotUsed, Done]    def submitWithdrawal(bankId: UUID, amount: BigDecimal):
        ServiceCall[NotUsed, Done]}

Implementation

This is where we handle the service implementation. We pass in the entity registry, which keeps track of all the entities in the microservice.

We then register the needed entities when the service is started up, in order for us to send it commands. In this case, we only have one entity, BankEntity.

We finish the abstract methods defined in BankService, where they will find the corresponding BankEntity and issue it the relevant commands.

class BankServiceImpl (persistentRegistry: PersistentEntityRegistry)(implicit ec: ExecutionContext) extends BankService {    persistentRegistry.register(classOf[BankEntity])    override def submitDeposit(bankId: UUID, amount: BigDecimal): 
                 ServiceCall[NotUsed, Done] = request => {        val ref = persistentRegistry.refFor(
                      classOf[BankEntity], 
                      request.bankId
                  )        ref.ask[Done, SubmitDeposit](SubmitDeposit(request.amount))    }    override def submitWithdrawal(bankId: UUID, amount: BigDecimal): 
                 ServiceCall[NotUsed, Done] = request => {        val ref = persistentRegistry.refFor(
                      classOf[BankEntity], 
                      request.bankId
                  )        ref.ask[Done, SubmitWithdrawal]                   
               (SubmitWithdrawal(request.amount))    }}

Sample Deposit flow

When a user calls /api/bank/:bankId/deposit/:amount
BankService processes the request and hands it to BankServiceImpl to finish
BankServiceImpl looks up its entity registry, and finds the relevant BankEntity based off the bankId passed in the by the user
BankServiceImpl creates SubmitDeposit command with the requested amount and submits the command over to BankEntity
BankEntity handles the SubmitDeposit command, creates the DepositReceived event and persists it
BankEntity handles the DepositReceived event and increments the balance in BankState

This is handled similarly for the withdrawal flow.

At the very end of our use case, we will have two events that we persist in our database.

DepositReceived(amount: 200, timestamp: 2018-01-01 05:00:00)
WithdrawalReceived(amount: 50, timestamp: 2018-01-02 13:00:00)

In the case that a tragic event happens and our current BankEntity goes berserk, we can always spin off a new entity, initialise it, and replay the events. We can replay events up till 2018–01–01, and we will see the state with a balance of $200. We can replay till the very end, and we will see the state with a balance of $150.

In short, we can step through the actions that a user can take, and observe the tangible changes that it makes to the state.

In the case of auditing, all we need to do is hand over our events journal. After all, the entire system state is built on the events journal, which makes it our only source of truth.

Conclusion

Event sourcing is quite a different approach as compared to the conventional CRUD method that we were most likely brought up on. But the potential that it can bring to the table is something that will more than offset the learning costs. Given the rise of many robust event sourcing frameworks, such as Lagom, the barrier to entry to implementing event sourcing is getting lower and lower with each passing day.

As a developer, I highly encourage all tech teams in highly regulated industries to really take a good, long, hard look at adopting event sourcing. At StashAway, using this framework has allowed the tech team to breeze through many otherwise uncomfortable business and regulatory requirements, and just really focus on developing new and exciting features for our product.

Given that adopting event sourcing alone is insufficient, we will also follow this post up with one on CQRS, where we will discuss the many pros and cons of segregating updates and display, and how this dovetails perfectly with event sourcing for us to build a robust distributed system.

We are constantly on the lookout for great tech talent to join our engineering team — visit our website to learn more and feel free to reach out to us!