Stop overselling Event Sourcing as the silver bullet to microservice architectures

Oskar uit de Bos
The Startup
Published in
7 min readJun 29, 2020

Software engineering, as a culture, seems determined to sell each other on solutions based purely on some benefits, without consideration of context, complexity and cost. If you are not using complicated architecture pattern X or infinitely scalable message broker Y you are doing microservices wrong. So, contrariety to popular believe, engineers are better at delivering (one-sided) sales pitches than most think. All jokes aside, it’s problematic. It leads to over-engineered microservice architectures and it completely disregards the trade-offs, which becomes problematic for engineering teams who are not prepared to deal with those trade-offs.

A good example of this is the combination of the Event Sourcing and the Command Query Responsibility Segregation (CQRS) architecture patterns. Often presented as the silver bullet for solving many robustness, performance and decentralized data challenges that microservices architectures face. Yet there is hardly any talk on the ton of hidden complexity with these patterns. So, how do these patterns handle robustness, performance and decentralized data challenges? Let’s explore.

Typically, loose coupling of microservice is achieved by communicating through https using the REST pattern and versioning the endpoints the microservice exposes. This enables microservices to evolve freely without breaking each other and thereby solve the dependency hell that many monolithic systems struggle with. Despite loose coupling, microservices still depend on each other for data.

For example, an order microservice in an e-commerce system needs customer data from the customer microservice. These dependencies between microservices are not ideal. Other microservices can go down and synchronous RESTful requests over https do not scale well due to their blocking nature. If there was a way to completely eliminate dependencies between microservices completely the result would be a more robust architecture with less bottlenecks.

Enter Event Sourcing and the Command Query Responsibility Segregation (CQRS) architecture patterns. These patterns help implement asynchronous event-based communication to push data updates to all microservices that are interested in that data. The order microservice from the previous example would listen for events with changes in customer data and use that to update its own local copy of customer data. The figure below illustrates what that would look like on an abstract level.

Figure 1: Overview of Event Sourcing and CQRS architecture patterns
Figure: Overview of Event Sourcing and CQRS architecture patterns

Now the order microservice no longer needs to make a RESTful https call to the customer microservices to get that data. These patterns also provide ways to recover from data (in)consistency problems that can and will happen at some point when data is duplicated in multiple microservices.

To summarize, these patterns can indeed help overcome some fundamental challenges with microservices. However, as mentioned before there is a lot of hidden complexity with these architecture pattern. While I’ve briefly explained what Event Sourcing and CQRS can do, I need to explain how it work in more detail before diving into the complexities.

How does Event Sourcing work?

Event Sourcing is different in how state is managed. Systems that don’t use event sourcing store the current state in a database and they update that state directly when a change is made, overriding the previous value. In these types of systems, the database is the single source of truth. Event sourcing stores data as a sequence of immutable events instead. In order to get current state, the sequence of events (also called event stream) is processed from beginning to end.

For example, ordering a product at an e-commerce platform. The user finds the product and places it in the cart. Initially, the user selects same day shipping, but later reconsiders and selects regular shipping before confirming the order. Assuming the e-commerce platform would capture all these actions and store them, the figure below illustrates the differences in state between an Event Sourced system and only storing the current state:

Figure: differences in state between Event Sourcing and only storing the current state

The key difference is that when storing current state in a database, every time the state is updated, the older information is lost. That is not the case with event sourcing. The fact that the user initially selected same day shipping is preserved an immutable event in the event stream. In Event Source systems, the events form the single source of truth with the unique capability to fully rebuild current state from scratch at any point in time.

The drawback is that reading hundreds or thousands of events every time to get current state does not work. Performance would degrade over time as more events are appended, which would lead to scaling issues in the system. This is where event sourcing gets help from CQRS. The CQRS architecture pattern prescribes separation of the read (query) and write (command) logic of an application.

This separation offers benefits such as the flexibility for the write and read model to be scaled independently as well as using the most appropriate technology to support both concerns. Meaning there is no longer have a single component or technology that is responsible for managing all the data.

Event sourcing takes care of the write logic, where events are persisted in an event store and broadcasted using a publish/subscribe approach to inform microservices that there is a change in data. The microservices implement code to handle those events update their copy of the data. So, write and read logic is spread over different parts of the system.

Event Sourcing and CQRS complement each other really well. Event Sourcing would not work without some means to aggregate current state for query purposes. And having multiple copies of current state in different microservices means it can and will become inconsistent when something in the system breaks. With Event Sourcing the system is able to recover, as current state can be fully rebuilt by replaying the event stream.

Event sourcing has benefits like being a potential audit log due to the immutable nature of events. And the ability to rebuild state from events also gives Event Sourcing good support for temporal queries (point-in-time based queries).

What makes event sourcing complex?

Ironically, the power of event sourcing, having full history of what happened in the form of events, is also where the biggest challenge lies. Especially since events are immutable.

Not being able to change events without significant time investment makes event sourcing especially unforgiving. Found an event modelling design decision that you regret in hindsight or doesn’t play nice with that new requirement you didn’t anticipate? That thing has happened, you have to deal with it now and with the complexity it brings.

If you have a team that is highly capable in Domain Driven Design (DDD), designing business events and you work on a system that supports a relatively stable domain that is well known to the organization and team, you are likely in a good place to deliver value despite these challenges.

But this is definitely not the case for most of the teams out there. Event Sourcing is a big mental leap for developers. Not every developer is fluent in DDD as we have been working with databases and CRUD operations for a long time. That doesn’t mean we can’t get good at it, but it won’t happen over night. Then there is the domain itself. Developers are often asked to work on domains that they are less familiar with or develop a system to support a new business model that is still very susceptible to change.

In practice most of the projects and software development fit within the description outlined above. In these circumstances Event Sourcing is challenging and complex.

Another challenge is selecting a fit-for-purpose solution to support Event Sourcing. As events will be the backbone of the architecture; it needs to be highly available and scalable. Furthermore, the technology needs to support the right features like consistent writes, guaranteed event ordering and aggregate (re)building. Covering these concepts goes beyond the scope of this blog, but they are vitally important. At first glance, Apache Kafka would seem like a good fit for Event Sourcing, and I’ve seen examples of teams selecting it to support Event Sourcing. However, Apache Kafka does not support the aforementioned features, which introduces challenges. Bringing in Kafka Streams improves the situation, but it’s not designed for Event Sourcing so there are limitations. Don’t rush selecting the solution.

The final challenge is the General Data Protection Regulation (GDPR) privacy legislation, which applies to any system globally that processes data of EU citizens and residents. The GDPC contains a lot of articles, but the one specifically relevant in the context of immutable events is article 17, the right to erasure (often called the right to be forgotten).

Meaning that when a customer demands their data to be removed, an organization needs to comply. This legislation does not play nice with the concept of immutable events. There are solutions published on how to potentially deal with this challenge. It is something to consider when architecting a system using event sourcing and selecting your event store solution.

Do most microservice architectures need Event Sourcing?

No, they don’t. I firmly believe that for the majority of microservice architecture, having RESTful https dependencies between microservices is not the “your system is still a monolith” death sentence that some make it out to be. There is plenty to be done:

  • Reduce the number of dependencies between services through proper service sizing using practices like Bounded Context.
  • Build resilient microservices, where the impact of failures is contained and minimized. There is a big collection of patterns and practices, in most cases bulkhead and circuit breaker are good places to start.
  • Performance tune data storage solutions, use asynchronous I/O and implement caching where it’s absolutely needed. Don’t underestimate the complexity of caching.
  • Run microservices multi-instance by default and perform rolling updates to maximize availability.

It’s important to keep a fit-for-purpose mindset, and not buy-in to the complex architecture patterns sales pitch. Silver bullets don’t exist in software engineering, tradeoffs do. Believing otherwise creates a “if you only have a hammer, every problem looks like a nail” mindset instead of critical thinking.

Fortunately, the truth is that you have a whole toolbox at your disposal. Make sure you know the tools in there, and how to use them without hurting yourself or those around you.

--

--

Oskar uit de Bos
The Startup

Engineering Manager at Albert Heijn, empowering teams to build services and applications used to run over 1100 Albert Heijn stores in the Netherlands!