When discussing autonomous services and event-first architecture, the conversation inevitability turns to the topic of event sourcing. Event sourcing is a very populate design pattern and arguably one that is not always well understood. I frequently encounter concerns about using it. The perception is that event sourcing is complicated and should be used sparingly. But event sourcing has evolved as the technologies around it have evolved. I certainly concede that some of the nuances of event sourcing are not for the faint of heart. Yet the benefits of event sourcing warrant using it across an entire system. So lets look at how we can achieve system wide event sourcing, while letting individual autonomous services focus on the nuances that are applicable to them.
In traditional event sourcing, every domain event is represented as an immutable row in a database (i.e. log). To know the current state of a domain object we must calculate it from all the domain events. This is where the concern about complexity creeps in. To combat this complexity we can add the additional complexity of taking periodic snapshots, so that we only need to process the snapshot plus the more recent events to determine the current state.
With all this complexity in mind, you can imagine the reactions I receive when I start talking about using event sourcing across an entire system. Instead, when I refer to system wide event sourcing, I mean that on aggregate the whole system is event sourced, not that every autonomous service in the system has to perform all the gory event sourcing details.
The best way to convey this system wide approach is with an example, such as a classic banking example of making deposits and withdraws and viewing the current balance.
The first autonomous service in the example is the Account BFF (backend-for-frontend), that allows account owners to view their balance and make deposits and withdraws. When the account owner makes a deposit or withdraw the service persists the state change events (deposited|withdrawn) directly to the event stream. This service is not concerned with calculating the new balance. Instead, it just listens for balance-updated events and records them in its own balances table (i.e. cache/materialized-view), so that account owners can see their balances before initiating a transaction.
Next, the Balance Control autonomous service listens for deposited and withdrawn events, then correlates and collates them in its own micro-event-store. The addition of a new event to the store triggers calculation of the new account balance. The related (i.e. correlated) events are retrieved so that the new balance can be calculated. The result of the calculation is persisted straight to the event stream as a balance-updated event for other services to consume. This service also listens for its own balance-updated events and uses them as snapshots to improve its own performance.
Any number of other services, such as the Reporting BFF, can consume the events and use them as they please. In this case the Reporting BFF, maintains its own micro-event-store, so that it can trigger calculation of statistics however it needs. ACID 2.0 techniques are naturally applicable here for performing calculations.
Finally, the Lake service consumes, indexes and stores all events in perpetuity, as the source of truth for events. Events can be replayed from the lake to repair existing services and seed new services.
The main advantage of this system wide approach is that we retain the benefits of event sourcing while controlling the complexities. All events are elevated to first-class citizens in the system, as opposed to just being ephemeral messages that disappear after they are consumed. Services simply persist their state change events to the event stream. The event stream provides a temporal event store (i.e. log), so that services can consume events in real-time. The lake maintains a system wide audit trail of all events in a perpetual event store, for all kinds of future uses.
The complexities of event sourcing, such as calculating the current state and creating snapshots, are delegated to specialized services. These autonomous services consume events of interest and maintain there own micro event stores to support their specific calculations. As new events are inserted the services continuously perform their current state calculations and share the results with other services as first-class events. As testament to this approach, these same events naturally serve as snapshots to improve the performance of these services. I will discuss the order tolerance, idempotency and ACID 2.0 characteristics of these services in a separate post.
The flexibility of the system is significantly improved. Following an event-first mindset, the system can easily and continuously evolve as producers and consumers of well-defined event types are added and removed. For example, the Account BFF in this example could be one of many account BFFs. Each could be tailored to a specific account owner usage scenario. Or maybe multiple versions of the same BFF exist concurrently as enhancements are rolled out. It is the contracts of the well-defined event types and the event-first mindset that facilitates this evolution. I will discuss event type contracts in a future post.
Last, but certainly not least, system wide event sourcing and its event-first mindset provide the foundation for creating autonomous services with necessary bulkheads! I discuss how System Wide CQRS completes this foundation here.