Understanding Coupling with Event-Driven Architecture

Published in

SSENSE-TECH

10 min readJan 13, 2023

Event-Driven Architecture is often chosen due to its promise of decoupling the services involved. While this is true, it does not come without its challenges and can lead to problems that are non-existent with other architectural choices. In this article, I will go back to basics, covering the motivation and essential characteristics of an EDA while evaluating the types of coupling. I will finish by presenting an option that aims to leverage the asynchronous nature of events and keep a healthy coupling level.

Back to Basics

In order to deliver any functionality, a system handles the input it receives, processes it, and potentially needs to access resources outside its execution context.

Figure 1. Single compute unit with its execution context and external resources.

Of course in most modern situations, our system is actually composed of a set of smaller pieces — other subsystems or services. Those, commonly distributed over a network, work together to process the input received and carry out specific actions.

Figure 2. Multiple separate execution contexts, one per service being involved.

In their simplest form, the connections illustrated in Figure 2 are conducted using a synchronous call. Nowadays, HTTP is a common choice, but other forms such as RPC are also used. This would be fine if it wasn’t for the fallacies of distributed computing.

To accommodate the growth in complexity, and be cognizant of those fallacies, our distributed model needed to evolve. Enter the Event-Driven Architecture (EDA).

Event-Driven Architecture

Far from a full-fledged description of all the nuances in an EDA, let’s discuss two of the key aspects:

Focus on Notification

Prior to the EDA, you were requesting for an action to be taken. Now, you are notifying others that something happened in the service/system. This is what we refer to as an event.

2. Asynchronous

Instead of a blocking call, where the originating service waits for a reply prior to continuing, in an EDA we publish the event and move on.

If we take those two aspects, it is already possible to see the allure of EDA. All of a sudden, the fact that the network is unreliable, and that topology can change, is less of an issue as you are no longer waiting for the reply and are not even aware of who consumes the events you published.

This last part is the so-called decoupling of those publishing the events from those that consume them. Figure 3 showcases that while Service A is notifying us of what happened, it is unaware of who is consuming the events and when they are consumed.

Figure 3. Service A publishes the event and continues. Service B and C consume those events independently and at their own pace.

Imagine the possibilities:

Service B can be down and Service A is not affected.
Service B can be configured to batch those events and process them when it is more convenient — or cheaper — to do so.

This is powerful, but before looking at the inevitable trade-offs let’s look at what coupling is and how it can have a negative impact.

Coupling Revisited

When looking for a definition of coupling, we can find one that states it is “the degree of interdependence between software modules; a measure of how closely connected two routines or modules are.” — Wikipedia.

In our context, the greater the dependency between two services, the more coupled those will be.

Types of Coupling

Platform

In this form of coupling, you have two or more services that exchange information using a specific binary protocol that is only available in a certain platform or language. It is less of an issue for most modern systems, but still relevant for legacy ones.

2. Temporal

Perhaps one of the most common forms of coupling, temporal is associated with the synchronous communication between the services.

3. Context

In this case, the parties involved share some knowledge about each other, from the fact that one must know the API endpoint and signature to what each specific field means.

Why do we strive to reduce the coupling between services? It is mostly about managing complexity and reducing the blast radius in case of problems or changes.

Managing Complexity

While complexity is a relative term, let’s try to look at it from a cardinality angle. Imagine you have a Service A that, in order to deliver the expected functionality, needs to interact with 10 other services.

Figure 4. Multiple dependencies to satisfy a single requirement.

As a developer, you have to understand the purpose and impact of your changes in those dependencies. Additionally, chances are your system context will contain foreign concepts that are needed to interact with all those services.

Using techniques like anti-corruption layers can help, but still, there is the potential for cognitive overload.

Reducing the Blast Radius

In this case, still using the same example, your quality of service is directly affected by the failure of any of those 10 direct dependencies. If any of those dependencies misbehaves, is under maintenance, or is simply unavailable, you will have a direct impact.

Based on what we have seen so far, the solution is simple right? Adopt EDA and have zero coupling!

The Trade-Off Principle: No Such Thing as a Free Lunch

To understand why zero coupling is not realistic — or that it doesn’t necessarily solve all problems — let’s circle back to the definition of coupling, using the services instead of modules: “…the degree of interdependence between services”.

Zero coupling means two services have no interdependence, which effectively means that Service A does not depend on Service B to carry on with its responsibilities. But what about Service B? At first you may say the same applies, but let’s take a closer look.

Figure 5. Order service is decoupled from the payment service as it does contact it directly.

Figure 5 illustrates that although the order service is unaware of the payment service, the latter needs to know what an OrderPlaced event means and understand its contents enough to carry on with its expected behavior of capturing the payment.

Figure 6. Payment Service policy needs to know that an order was placed so the amount has to be captured.

Now, imagine that the business decides to offer pre-orders and the logic is that you should capture just a specific amount instead of the entire value. They decided to update the contract to reflect that.

In this case, your payment service needs to be updated to understand the new contract and its expected behavior.

Figure 7. Payment Service needs to be updated to understand the changes in the Order.

This indicates that even though there is no coupling of the order service with the payment service, the same can’t be said for the payment service. So, the expectation that an EDA provides zero coupling is not actually true.

While this is not to say that EDA is a bad choice, there are some challenges associated with its adoption that can’t be neglected. Let’s quickly review some of them.

Challenges With an Event-Driven Architecture

The main four challenges that come with an EDA are:

Handling Out-of-Order Events

When developing we tend to have a happy path scenario in mind. If I apply something to Service A, it will publish an event that will be consumed by any services that care about it.

In reality, depending on the infrastructure used, you may end up receiving messages in a different order than they were sent. For example, if you use the standard SQS there are no guarantees in the ordering and newer events can arrive sooner than older ones.

Figure 8. Potential out-of-order events being consumed.

Or, if you use an orchestration approach, such as AWS Step Functions, you may end up with the following issue.

Figure 9. Out-of-order even if the events are in the right order.

There is no one-solution-fits-all. You have to assess if, in your context, you can simply disregard old messages, include a buffering window prior to consuming so the events can be re-ordered, or simply fail to process and trigger an error.

Duplicated Messages

It is common to talk about delivery guarantees with any messaging infrastructure, which is the delivery mechanism of your EDA. You will find “at least once” is associated with many solutions and it means there is a chance you will end up receiving the same message twice.

Figure 10. Receiving the same message twice with a potentially negative outcome.

Here, the solution is to try to make your consumer idempotent, which effectively means this second message will not change the state of your service.

Debugging is Harder

The default mode when using events is fire-and-forget, where your service does not know if any consumer of those events was successful in doing so.

If there is an oversight in the process, you may find that something is not right, but contrary to a direct mode where the failure is presented immediately, in this case you would have to start looking at the logs to see what went wrong.

Understanding Impact is Harder

A common selling point of EDA is that you can expand the system’s behavior without making changes to the originating service.

Figure 11. When it is time to make changes, Order Service has no idea of the impact/affected services.

This flexibility is amazing but comes with the flip side that if you decide to change an event — or add a new one — to capture new business requirements, you do not have a clear understanding of which other services may or may not be impacted.

One recommended mitigation strategy is using a mix of some sort of registry, so there is a catalog of all that subscribe to a certain event, and keeping process-based documentation where the services that are choreographed to deliver a successful outcome are known.

How Can We Improve?

The first step toward improvement is understanding and acknowledging the trade-offs associated with EDA. This will help to gauge the benefits against the extra work that may be required to deliver the solution and prepare accordingly. Additionally, there are two areas to focus on when designing your solution:

1- Reducing the Dependency Cardinality

Figure 12 illustrates the dependencies in two systems.

Figure 12. Looking at the dependency cardinality.

In both cases, you have a high number of dependencies.

In the first system you have many services depending on a given event to trigger their actions. As previously mentioned, this comes with a cost when changing the event, either by creating a new one or adding new properties to capture updated business requirements.

In the second system you have the opposite, a given service has to listen to a myriad of events from various sources. Here the cost comes in the form of having to understand what all of these events are.

One way to reduce the number of dependencies may come in the form of grouping together two or more services that end up always working together towards the same goal. You may have to fight against Conway’s law, but this may be a chance to revert a premature, “noun-based” criteria, that led to this breakdown.

Figure 13. Merging two services into one to reduce the dependency count.

2- Balancing the use of Events and Commands

Both commands and events are messages that will flow in your system via some sort of messaging infrastructure. But commands, contrary to events, are actions we expect others to take and are sent directly to one single recipient.

Figure 14. A command is imperative and has a 1:1 relationship with the destination.

In Figure 14, the order service knows of the payment service and explicitly asks for it to capture funds. But because we are leveraging an asynchronous message, it does not wait for the payment service to consume it.

The payment service, on the other hand, does not need to know about the order and exposes an interface that explicitly expects how much should be captured. If we revisit the example where pre-orders would be introduced, nothing would change on the payment service as the order service would be the one indicating the amount to be captured.

By using both events and commands you can achieve balance, where you still have the benefits of asynchronous execution with reduced cardinality.

Figure 15. A mixed approach with both asynchronous commands and events.

Conclusion

Event-Driven Architecture is a powerful approach that definitely provides benefits and should be considered when designing your solution. It addresses several aspects that can help your service to be more resilient in an environment where you can have spikes of access and things fail all the time.

Simultaneously, do not be misled by the idea that choosing EDA solves all problems or that you can achieve a zero coupling nirvana.

Be aware of the challenges that you will now have to face and consider a mixed approach where the cardinality of the dependency is kept at a low level — independent of the integration pattern used — and an asynchronous command/reply is used as well.

Editorial reviews by Catherine Heim & Gregory Belhumeur.

Want to work with us? Click here to see all open positions at SSENSE!