EventBridge Storming — How to build state-of-the-art Event-Driven Serverless Architectures
Microservice architectures only work when they:
- Are split into clear Services
- Can be deployed independently
- Only communicate with each other asynchronously
- Master their own data
Service-Oriented Architecture (SOA), a term that preceded Microservices, has very similar tenants. SOAs often differ from what we see as modern Microservices in their communication mechanism.
One popular variant of the SOA approach is the Event-Driven Architecture (EDA) paradigm in which services consume and produce events, allowing a loosely coupled interaction between services. Such an approach to SOA often relied on an Enterprise Service Bus (ESB) to provide the transport of these “Events”.
What is an Enterprise Service Bus (ESB):
- Enterprise: Implies use in large Enterprise organisation as often used to tackle complexity in these domains along with the historically large infrastructure investment needed.
- Service: As this is providing a way for the different Services (logical self-contained representations of business processes) to communicate.
- Bus: Referencing the hardware element of computers that allows the transfer of signals between different components.
Event: “A significant change in state” — K. Mani Chandy
“Bridge over Troubled Architecture”
In 2019 AWS launched a new Serverless service, Amazon EventBridge, formalising the flow of Events through Serverless architectures. Before this, many people working in an Event-Driven paradigm had been “hacking” a Bus for events on top of the CloudWatch Events.
For a full understanding of EventBridge see the previous article, EventBridge: The key component in Serverless Architectures and watch James Beswick’s great introduction video.
In short, EventBridge is the biggest Serverless announcement since the release of AWS Lambda and is the key component to building state-of-the-art Serverless EDAs.
The ESB is dead, long live the ESB
To contrast with typical ESB solutions, EventBridge is completely Serverless. It requires no management and integrates easily with all the existing AWS services. By default, all AWS cloud events from CloudWatch go into a “default” EventBus and you’re able to build your own Event Bus inside EventBridge for your custom application Events.
Avoiding the “Lambda Pinball”
The Lambda Pinball is a Serverless anti-pattern highlighted by ThoughtWorks, in which “we lose sight of important domain logic in the tangled web of lambdas, buckets and queues as requests bounce around increasingly complex graphs of cloud services.”
This is often the result of a lack of clear Service boundaries. Moving to an EDA and adopting EventBridge can help massively — but this is not a standalone silver bullet.
What is needed is a focus on Services, identifying clear Bounded Contexts (to borrow from Domain-Driven Design) and sharing Event Schemas, not code, API interfaces or Data.
Event Storming is a workshop approach to defining the Events, Boundaries and Entities in your business domain created by Alberto Brandolini as an extension to Domain-Driven Design (DDD).
Full-on DDD can be a lot to earn and master, and can result in over-engineering for systems without huge domain complexity.
Event Storming though can be used in isolation.
The following guide will lay out the steps to Event-Storm towards a state of the art Event-Driven Serverless Architecture based on EventBridge.
This variant of Even Storming we’ve ended calling EventBridge Storming. The focus is less on formal DDD, but in pragmatically structuring Serverless architectures based on EventBridge.
“EventBridge Storming” can be used on Serverless greenfield or brownfield projects, ranging from the extremely simple to the scarily complex. I ensure a common language, maximum understanding of Events in business domain terms with a list of independent Services to create and a Schema to create in the EventBridge Schema Registry.
Benefits of “EventBridge Storming”
- Reduced Coupling
- Faster development speed in the medium to long term.
- More adaptable architecture & reduced rebuild risk
- Reduced code requirements
- Better system ownership by teams
- Improved availability
EventBridge Storming Guide
“EventBridge Storming”: A specific variant of EventStorming that reduces rework and tight-coupling for teams building state-of-the-art Serverless Event-Driven Architectures with EventBridge.
EventBridge Storming, based on Event Storming, starts with the business and technical members of a project conducting a whiteboard workshop to understand their systems. This is done, ideally in person, with lots of Post-It notes to hand. Typical Event Storming has particular guidelines around Post-It colours, though we’ll be focusing on Events and these rules are less important.
Following the whole team workshop (steps 1–5), the technical team will take the output of this session and continue to encode this into the architecture (steps 6–8). This whole process should be done in less than a week, with steps 1–5 done in a 1-day session. The exact timing depending on domain complexity and team cohesion.
Ideally the steps are done in person, but when this is not possible a group video call and the use of a tool like https://metroretro.io/ can still be highly efficient.
1. Event Discovery
As defined earlier, an Event can be understood as a “significant change in state”. In this workshop, the focus is on Events from the business domain. The focus should not be on technical Events or implementation details, instead, the real-world Events the system needs to handle must be elaborated.
This first step involves the whole team writing Events on Post-Its and putting them on a large blank canvas. All events should be written in the past tense and should focus on verbs.
For instance, “Order Placed”, “Payment Failed”, Payment Success”, “Item Dispatched”.
It’s important not to focus on grouping, removing duplicates or filtering during this time — it’s crucial to get all known domain Events on the board.
Clean-up and Grouping are done later.
For a simple greenfield project in a startup environment, this can take as little as 45 minutes — for large organisational digital transformation, it can be much longer.
2. Temporal Sequencing
This step involves putting all the Events in time order, left to right (concurrent Events are stacked). The focus here is not grouping, it’s getting all the cards in order left to right. If it gets complicated to organise subprocess, use some vertical separation, but keep this time focused on getting events in order, not groups.
Keep going backwards and forwards across the sequence adding in missing Events. Having those sometimes complex conversations, agreeing on terms and removing duplicates.
3. Trigger detection — Optional
Trigger Detection is a stage often used in Event Storming where we add the triggers, commands, external systems and actors. This can be very useful in forcing ourselves to think more critically about the Events of our system and ensure shared terminology — yet this is not key for all systems.
4. Categorize into Entities (& Aggregates)
This is the first phase of grouping. The first step is to look for the nouns in the Events on the board. From this, it’s easy to start to create new Post-Its (preferably a different colour to the Events), with these Entity names. If the Trigger Detection step has been skipped, Actors may be captured as Entities.
In formal DDD, an Aggregate is a collection of Entities (e.g. a Car, Order, Store). This workshop is focussed on Event modelling and not data modelling. Staying at a fairly high level of abstraction when it comes to the Entities will be best — use the Entities emerging from the Events (some of these may be aggregates, but that’s not a distinction we need at this point and it can prove confusing for the room).
5. Categorization into Bounded Contexts
This second stage of grouping is aimed at finding the boundaries between our to-be Systems. The maim is to eliminate dependencies between these groupings. As a team draw circles around logical Event and Entity groupings, optimizing to reduce interdependency — this will reduce tight and temporal coupling in the resultant system.
In formal DDD, a Bounded Context is a set of language consistency — a context in which a term is understood ubiquitously. For instance, Order may mean one thing in an Item Dispatch context and another thing in a Marketing context. This is quite a nuanced concept and for some simpler systems the idea of term inconsistency may not present itself, yet it’s still very useful to make logical groupings of events that eliminate dependencies. This is a starting point to elaborate the list of microservices and their structure.
The clearer the understanding of a system in business terms the clearer and more adaptable the resulting code.
If an Aggregate/Entity appears in multiple bounded contexts this is not necessarily a problem — it just means that the underlying systems may need to duplicate data to be deployable in isolation and consideration must be given to potential temporal coupling.
Now that the Bounded Contexts have been identified they needed to named with agreement across the business and technical stakeholders. for instance, Order Management.
From now on the rest of the EventBridge Storming can be handled by the technical team.
6. Name Microservices
A Lambda does not a Microservice Make
We need to build Microservices corresponding to our Bounded Contexts. Based on the above example you can intuitively get an idea around the granularity of a Microservice in such an approach.
Many teams though, especially when a similar session has not been done, can fall into the trap of saying a Lamdba is a Microservice (or NanoService). This leads to very bloated Lambdas, poor management of multiple Lambdas that should have been grouped as a Service and completely overlooking the deployment and configuration of other components like DynamoDB, Step Functions, SQS, SNS, etc…
In the case of DynamoDB, thinking in terms of multiple Services associated with Bounded Contexts and their underlying Aggregates/Entities can result in purposeful data duplication. Without this process, a shared database anti-pattern can then violate the independent deployment tenant of the SOA approach.
Building a List of (Micro)Services
A Bounded Context is not the same thing as a Microservice. A Bounded Context is a space in which particular terminology can be understood ubiquitously, generally involving a single team with few stakeholders.
In contrast, a Microservice refers to a set of things that are deployed together.
- One Bounded Context may have multiple Microservices corresponding to its underlying Aggregates & Entities.
- ⚠️If one Microservices is implicated in multiple Bounded Contexts… alarm bells should be ringing! This often results in a tightly-coupled distributed monolith.
When we formed our Bounded Contexts we tried to avoid interdependency. It’s the same in forming our Microservices. We should try to eliminate the need for synchronous requests between services. For instance, an architecture in which all services have to call to the Order Microservice to complete their processes creates a single point of failure and the tight coupling decreases the adaptability of the system.
As a technical team try to list the Microservices needed for each Bounded Context. In simple systems, this may be a 1–1 mapping. In more complex contexts it may be per Aggregate contained in a Bounded Context.
⚠️If an Entity or Aggregate appears in two Bounded Contexts, this implies there should be two independent Microservices to handle the divergent business processes in these two contexts.
7. Creating a Single EventBridge Event Bus
With the exception of the Default Bus, your architecture should have only 1 and only one Event Bus. This is because all Events should be consumable by all Services, without teams and code having to coordinate.
Teams should be able to discover Events and consume them as needed — only having to receive and publish to one bus.
8. Building a Shared Schema
Teams should be able to work on and deploy Services independently. With a move to a Serverless EDA based on EventBridge, we’ve moved away from the mental model of synchronous Request-Response, instead thinking of asynchronous interactions through unidirectional Events.
Teams need to agree on the structure of Events, their Schema. This includes the title of the Event and the structure & types of attributes. Teams, and even systems managed by a single team, should share a Schema, not a database, codebase or API interface.
All the technical team members, or representatives if in a larger context, should dedicate workshops to working out the typed Schema for all event.
This is added to the EventBridge Schema Registry (see more on this in the previous article on EventBridge).
Teams should share Schema, not data and code.
In short, we need to build using decoupled Services in an Event-Driven approach. State-of-the-art serverless EDAs leverage EventBridge to achieve this.
Event Storming is an extremely useful tool and provides the key first step. The EventBridge Storming variant explained in this article is a pragmatic guide to building a state-of-the-art Serverless application using EventBridge.
This is not a plan that is not subject to change, it’s a starting point and such sessions can be repeated.
EventBridge Storming: A specific application of EventStorming that reduces rework and tight-coupling for teams building state-of-the-art Serverless Event-Driven Architectures with EventBridge