The Good, the Bad and the Ugly: How to choose an Event Store?

Frank Steimle

Published in

Digital Frontiers — Das Blog

9 min readJun 25, 2021

CQRS/ES is a very popular architectural pattern. But what has to be considered when choosing an event store?

Introduction

Command-Query-Responsibility-Segregation (CQRS) is a very popular architectural pattern when it comes to implementing micro services. Its core idea is to have separate models for read and write operations (aka queries and commands). This allows to improve scalability, because you are able to scale up models that are frequently used. Extensibility can also be improved, because you are able to adapt relevant models (or introduce new ones) if you need to add additional features.

Often CQRS is considered an enabler for Event Sourcing (ES). Event Sourcing is not an architectural pattern, but a way of persisting data within your application. In contrast to applications based on a 3NF database design, ES does not store the current state: It stores all events that occurred in your application which led to the current state. That is, ES increases the extensibility provided by purely CQRS-based solutions even more!

Figure 1: Architecture of a CQRS/ES-based micro service

Imagine a CQRS-based micro service with three read models, where each read model is designed for a different use case. Figure 1 shows the basic architecture of this application. There is an interface which routes commands to the write model. When the command can be applied to the current state, the application saves an event to the event store. After saving, the events are applied to the relevant read models. The read models are used to answer queries that are routed to them by the interface.

Further, imagine that after two years you discover a fourth use case from the subject area analytics and want to implement it. In a state-based CQRS system, your new feature can only provide insights from the first day it goes live. This means that you can not derive insights from your data gathered during the first two years of your service (or at least not utilize the full potential of your new feature). If you add this fourth feature to your CQRS/ES-based system, you will be able to utilize the full potential of your new feature because you can use the data you have gathered since day one. This is because you would not need to start with an empty fourth read model, since you would be able to hydrate your new read model using all your data — from the very beginning of your event stream.

Besides having a collection of all your events, another benefit of Event Sourcing is that all write accesses of your application take less time (appending an event is usually “cheaper” than updating the whole state). Additionally, you implicitly get an audit trail to deal with any kind of regulatory scenarios.

As usual, all these benefits come with costs: read access to your database can become expensive, as the amount of data grows over time. But there are concepts to cope with this issue, like snapshots. At a technical level, ES has special requirements to your storage solution, which stores all these events: it needs to be able to manage large amounts of data and support special kinds of operations.

Requirements to an event store

Let’s start with a more in-depth look on which requirements an event store needs to fulfill:

At some point in time we have to hydrate our read models. If they are held in memory, we have to do this on start-up, or, if they are persisted, we have to do delta updates or a complete computation if we add a new feature. All of these cases have in common that we need to do a full sequential read of the events in our event store, in order to get an up-to-date read model. The only difference is that in one case we read all events from the event store, while in the other case (delta updates) we only need to read all events which occurred after a certain point in time.
Before any new event can be appended to the event store, the application needs to decide if the desired change is valid according to the current state. That is, before we can apply changes to an entity, we have to restore its current state from the event store, and then decide whether the desired change is valid given the current state. Therefore, we need to be able to efficiently read all events for a given entity (typically a domain aggregate).
Once we have decided to accept a change to an entity, we need to persist that change. This means we create a new event and append the event to the event store. Since the whole concept of Event Sourcing can only work if we do not alter the sequence of events, there must not be an operation like update or delete. If we want to delete something from our dataset, we have to trigger a deletion event which then gets appended to the event store.
Since a desired change may not only cause a single event to be emitted, but may also cause multiple events to be emitted, we need the event store to be able to store multiple events in an atomic operation. This will make sure that our application can not reach an invalid state, where some events were emitted while events which are emitted simultaneously somehow got lost. Generally speaking, we need the storing operation to support the atomicity and durability of ACID transactions.
Event Sourcing-based applications produce a lot of data in the form of events. The longer the application runs, the more data needs to be managed. The event store therefore must be scalable in terms of data.

In summary we can say that an event store needs to provide operations for full sequential reads, for reading events concerning a given entity and for appending events to the store. Additionally, an event store should support the atomic storing of multiple events and should be scalable.

Validation of well-established technologies

If you are developing an Event Sourcing-based application, you can use a broad variety of storage technologies. Of course you can decide to use a native event store like EventStoreDB. But there can also be a variety of reasons not to do so: Your manager does not believe that the chosen event store is future-proof or your manager does not want to hire additional personnel to operate the new technology. In most cases, there is some storage technology already in place that you can use instead of a native event store.

In the following, we will see whether well-established database technologies like relational databases or document-based databases satisfy our requirements. Since relational database technology is largely standardized, the findings in the following paragraph about Postgres can be applied to all relational database management systems. However, this does not apply to document-based database management systems. MongoDB is a well-known representative of document-based database management systems. Therefore, we will evaluate how MongoDB satisfies the requirements. Additionally, we will take a look at Kafka since many blog posts out there explain how to build CQRS/ES systems using Kafka.

Postgres as an example of RDBMS

Relational database management systems have been around for quite some time. Nearly every developer can use it. In addition to being easy to use, the data structure required for Event Sourcing is also very simple: you only need one table which contains a global sequence number, an entity identifier, a sequence number per entity and the payload (= the event) itself. Using this data structure, depicted in Figure 2, you can easily write queries which efficiently perform a full sequential read or a search for all events for a given entity ID. In order to realize the append-only policy, you simply do not use any UPDATE or DELETE statements. Also RDBMS is a very mature technology: it provides ACID transactions and there are ways to ensure scalability.

Figure 2: Data structure used for persisting data of an ES-based application in a RDBMS

In summary, we can say that RDBMS can easily be used as an event store, since it satisfies all the requirements natively.

MongoDB as an example of document-based databases

MongoDB is one of the most popular among the NoSQL databases. Its main characteristics include high scalability via sharding and schema-less storage. Due to the schema-less storage you can easily save events to MongoDB. But if you want to save multiple events at once to MongoDB, there is a problem: MongoDB natively only supports single document transactions. That is, you can choose to save multiple events in a single document or you can save multiple events into multiple documents. Both methods have their drawbacks: if you go with the first possibility, you increase the complexity when searching for all events of a given entity. If you go with the latter solution, you have to use the new multiple document transactions feature of MongoDB in which case you have to do it “right”, which also increases the complexity. Since we have just talked about reading events concerning an entity: MongoDB also has no support for a global sequence number. That means, in order to realize a full sequential read of all events you have to implement a logic for that by yourself.

Concluding, we can state that MongoDB requires additional effort when it should be used as an event store.

Kafka

Kafka is also a very popular tool. It is a distributed solution for processing and storing data streams. In Kafka, you can publish messages to topics and others can consume messages from these topics. Due to its distributed nature and its focus on messages it seems to be a perfect match for a tool that needs to cope with events and that needs to be scalable. But if you take a more in-depth look, there are some downsides:

If you use a single topic for storing your events, you certainly can perform a full sequential read of all events, but you also have to do this if you want to read all events concerning a single entity.
If you want to optimize reading events of a single entity, you can, e.g., create a topic for each entity, but then you have to restore the global order of events by yourself.
Another possibility to optimize for reading of events is the usage of partitions for your topic. Kafka ensures that messages with the same key get stored in the same partition and it is possible to tell your Kafka client to read from a specific partition. The downside of this approach is: you need guarantee an evenly distribution of your entities across the partitions. That is, you need to know something about your entities which can assure even distribution. Plus, you need to write the partitioning function based on this and you need hardware which is able to calculate this in an efficient manner.
Also, if you use Kafka as an event store, you have to consider that everyone who has access to the Kafka instance could read the topics you are using to save your events.
Last but not least, using Kafka as an event store means ignoring a very essential rule of ES-based applications: before you publish an event, you have to store it. That is, you can not ensure the storing of an event before publishing it when using Kafka, since these are no longer separate steps. If your Kafka instance fails while you try to store and publish an event, it is lost and you’ll never know. If your storage fails in an non-Kafka-based application, you will notice it, deal with it, and then publish your event. You could try use Kafka acknowledgements, but this means you need to use replication and therefore you need more hardware resources.

Wrapping up, we can say that although Kafka is often recommended for CQRS/ES-based solutions it is not a good choice for ES-based applications.

Conclusion

In order to realize ES, the storage solution aka event store must provide capabilities to read all events using sequential reading, to read all events related to a specific entity, and append events to the event store. Additionally, it must provide transactional capabilities when appending multiple events at once and it must be scalable in terms of the total number of events.

We have evaluated multiple technologies, namely RDBMS, MongoDB and Kafka, to see if they meet the requirements. Our findings show that Kafka, although frequently recommended for CQRS/ES solutions, is not a good choice when building an ES-based application. MongoDB can be used but provides an increased complexity when building the application since some ES features need extra code. If you want to build an ES-based application, a RDBMS should be your first consideration as long as you do not want to use a native event store.

Thanks for reading. Do you have any experience in building solutions that use Event Sourcing? What do you think about the requirements I presented? I’m looking forward to your feedback or your questions in the comments! More interesting articles will appear on the Digital Frontiers blog and will also be announced via Twitter.

The Good, the Bad and the Ugly: How to choose an Event Store?

Introduction

Requirements to an event store

Validation of well-established technologies

Conclusion

Written by Frank Steimle