Event Sourcing + CQRS: from theory to AWS — part 1

Christian Paesante
7 min readDec 1, 2019

--

With this series of post, I’m going to explain you how to successfully implement a Event Sourcing + CQRS system, exploiting the self-managed services offered by AWS.

In this article I’m going to quickly refresh you what Event Sourcing and CQRS are and what you can achieve combining the two. In the next article I’m going to discuss my design of a ES+CQRS general architecture applied to a microservice system.

Event Sourcing and Event Store

I’ll spend some time on it. This is required to fully understand how an Event Store works in order to being able to understand a design which is independent from the technology of the database used as an Event Store.

Event Sourcing ensures that all changes to application state are stored as a sequence of events. — Martin Fowler

The base of it is that when your application has to store the state of an object, it stores instead all the events that are bringing back to that state. So it writes events in an append-only collection called Event Store. You can imagine something as it follows:

As you can see, when you hit your /user/create endpoint, the application writes a UserCreated event containing all the initial information of the given user. When a property of the user changes, a UserUpdated event is created and appended to the collection, containing all the information relative to the change. (Beware: event names should be self-explainatory like AddressChange, NameChange, FriendshipEstablished).

With Event Sourcing is advisable to threat your Event Store as a Single Source of Truth. So, whenever you need to check if the user can make an update, you replay all the events of the user, compute your Aggregate (representation of a sequence of events) and then you make all the business logic validation on it. If the validation passes, you proceed to append a new event with the changes.

Most of the events in the collection are not relative to our user of interest. This means that when you have to compute your Aggregate, replaying also the events not related to the Aggregate of your interest is inefficient. Consequently the full collection is splitted in multiple Event Streams.

An Event Stream is a collection of events contextually bounded together. You can design them as you wish. My advice is to stick with the following rule: 1 Event Stream = 1 Business Entity. Examples: 1 stream = 1 user, 1 stream = 1 restaurant, 1 stream = 1 order.

Even if you are using multiple Event Streams, recomputing all the events every time is still inefficient, expecially in the case of streams subjected to tens of writes per day (this means they grows quite fast overtime). A solution for this is to have a Snapshot of the Event Stream. A snapshot is nothing more than an Aggregate computed up to a certain event. In this way, you only need to replay the events following the Snapshot. You update the Snapshot overtime in order to only need to replay a maximum of 10/20 events.

Let’s stop here for a moment and let’s do a recap of the various terms:

  • Event Store: a collection of Event Streams
  • Event Stream: an append-only collection of contextually bounded events
  • Snapshot: a precomputed representation of an Event Stream, obtained replaying all the events up to a certain point in the associated Event Stream. It is updated overtime in order to reduce the number of events replayed at every new transaction
  • Aggregate: a complete representation of an Event Stream, obtained replaying all the events from event 0 up to the last event of the stream. If there is a Snapshot, the Aggregate can be obtained getting the Snapshot and applying to it all the events from the Snapshot to the end of the stream.
    The Aggregate is used to run all the business validation required to make a write in the Event Store. It can be stored as a Snapshot in order to reduce the number of events replayed in the following reads of the stream.

Advantages of Event Store

  1. It stores the history of every object of your application.
    With any database you are using, object’s states are updated losing every time the history of the changes applied to them. What you get here is a complete and reliable audit log, from which you can rebuild the application state just by replaying events.
  2. It let you replay all the events from the start.
    This means that if you replay your events you can compute the represention of your objects in anyway you want. In any time, you can always change how your events are collapsed into an aggregate without sticking to a representation dictated by the database you are using.
  3. It can be exploited as a broker of “messages” between multiple services.
    If a service is interested in the changes of a certain object, it can “keep looking” on the associated stream. In this way, any new event written in it is atomically written and published to the interested service (or services).

Disadvantages of Event Store

  1. It’s not suitable for general purpose queries.
    For example: the query “get all the restaurants within 500 meters the given position”, implies recomputing the Aggregate of every Event Stream, and filter every aggregate by the distance from the given point.
  2. Event versioning.
    Any time you need to change the schema of an event, how do you handle it? There are multiple solutions described here, but generally speaking it’s not so straighforward.

CQRS

Every database sucks. — Greg Young

Let’s start from the number 1 from the disadvantages and the number 2 of the advantages of an Event Store. The first one says that you can’t use the Event Store as a database for quering data (a part from the events). But the second one tells you that whenever you replay your events you can always decide which representation best suits you.

So why do we not replay all our events, applying them to a certain representation, suitable for our queries, kept in another database (relational, document-oriented, graph, …)? And why do we not keep this representation eventually consistent by applying new incoming events as mentioned in the advantage number 3 (Event Store as a message broker)?

Well, you can achieve even more than that. What if you need more than one data representation? You can achieve it. You can run how many representation you want keeping them eventually consistent by subscribing to the new incoming events.

Even more, you can start any new representation without system downtime, by subscribing to new incoming events, replaying all the events from the start and applying them in order (with some deduplication mechanism). Once the new representation is ready, it will be kept eventually consistent and you can start using it. And here we comes to CQRS.

Command Query Responsability Segregation is a pattern that decouples write side from read side. It tells you to separate commands (writes) from queries (reads) in order to sustain bigger loads.

What you need to do is to use different models for reads and writes in order to simplify design and implementation. It’s also possible to physically separe the read data from write data, this means to have a database for queries which is kept eventually consistent with the database of the write side. The database for queries can be thought as a materialized view.

This eventual consistency is generally achieved by publishing an event from the write model every time it is updated. The update of the model and the publication of the event must be done atomically, which is one of the things an Event Store allows us. You can read more on CQRS here.

Advantages of CQRS

  • Independent scalability between reads and writes and indipendent performance optimization
  • Optimized schemas for queries
  • More flexibility in designing read and write models

Disadvarages of CQRS

  • Additional complexity
  • Eventual consistency

Conclusions

Event Sourcing joined with CQRS is a powerful combination that gives you flexibility in query design, native event-based communication and decoupled read and write sides for improved throughput capabilities.

In the next post I'll discuss about applying it in a microservice architecture, exploring the additional benefit this combination brings to that architecture.

--

--