In this article, we will look at how testing differs for Event-Driven systems when compared to traditional point-to-point architecture and we’ll propose an alternative approach for implementing a robust testing strategy for these systems.
Event-Driven architecture refresher
First, let’s do a short review of what an Event-Driven architecture looks like, compared to a traditional point-to-point, synchronous system. Let’s start with the traditional design. Below is an example of an order processing system, which handles payments, shipments, inventory management, and email notifications.
In this architecture, the Order Service acts as an orchestrator, coordinating actions of all the other services, typically through a direct API call. This service is the brains of the system and is responsible for enforcing business rules around the order lifecycle, such as sending an email notification to the customer, when the order has been shipped.
Now let’s compare this with an event-driven system.
The first thing you might notice is the absence of an orchestrating service. In this architecture, each service is responsible for announcing facts or events that pertain to its area of focus — such as a payment being successfully processed — but does not enforce rules that are driven by these facts. This pattern is often referred to as choreography, where each service knows how to respond to an event that has occurred, like ballet dancers observing one another, each responsible for their own timing. This pattern is also called Event Collaboration.
The choreographed approach tends to produce a loosely coupled architecture, that is easier to develop and extend in the future. This topic has been written about in many places, so we will not cover it here.
Trade-offs of Event-Driven Systems
As we just saw, event-driven systems are a whole different beast from traditional monolithic systems or classic point-to-point microservices. The data flow is asynchronous, components are decoupled, and have a single responsibility — all great characteristics in a modern software architecture! This allows for scalability (both, in terms of performance and scaling development teams), isolation of change, fault tolerance, etc. These benefits don’t come for free, however, as there is a tax to be paid. As Martin Fowler explains, distributed systems require a much higher operational maturity and the adoption of DevOps culture on your team. This is necessitated by the proliferation of components, compared to monolithic systems, all of which need to be built, versioned, deployed, monitored, etc. The effort of manually executing these tasks becomes ever more prohibitive as the number of components grows.
In addition to the operational requirements that normal microservice architecture imposes on the organizations, event-driven systems, especially, carry an extra tax. On the integration level, this architecture pattern is very different from point-to-point communication adopted by most microservice systems resembling traditional monoliths talking to external systems. This key difference requires a change of many software patterns including testing, which is the focus of this article.
There are multiple levels of tests you will typically write for your system. In the most canonical case, you will write unit tests, service tests, and end-to-end tests. In each of these cases, your System Under Test (SUT, what is actually being tested) comprises a different part of your application.
Unit tests are the most basic tests you will write. The SUT in this case is typically an individual class. Let’s say, the Payment Service needs to apply a sales tax, based on the customer’s location. You would likely have a TaxCalculator class with a calculate() method, that accepts an Order argument and returns a double value representing the tax that needs to be applied to the total. Your unit test will interact with this class directly, passing various Order values to it, and verifying the tax has been calculated properly. At the unit level, the differences between a point-to-point and an event-driven system are insignificant, so we will not go deeper into it.
Here’s what this could look like in Java, regardless of the architectural style:
Service tests, as the name suggests, treat the entire service as the SUT, and increasingly, in microservice architectures this is where the bulk of automated testing occurs. These tests verify the contract of services, that is — given certain inputs, the service produces a certain output. These tests execute in-memory (not a deployed service), prior to deployments and are mostly ‘cheap’ to run. Here is also where we begin to see substantial differences in test implementation between point-to-point and event-driven systems. The reason for these differences is a radically different collaboration style between services and hence — different contracts, that these tests need to cover.
We’ll start with a point-to-point system. The Payment System’s contract describes the HTTP request and response with its only consumer — the Order Service.
In Java, this could look like this:
In an event-driven system, Payment Service’s contract is different. Rather than HTTP messages, it operates on events. In this case, the service “promises” to process a payment and emit a Payment Successful event on the Payments topic, when there is a new order placed on the Orders topic.
The important thing is, that due to the decoupled nature of this architecture, the service doesn’t need to have the knowledge of the origin of the ‘input’ message and consequently, doesn’t need to know about the consumers of the ‘output’ message. Orchestrating such a test is a bit more involved, than with a point-to-point service test, as this time there is a message broker in the picture. In the case of a mature tech stack like Kafka and Spring Boot, we have access to an embedded message broker, specifically for this use case. With other frameworks or messaging platforms, we would be forced to resort to mocking abstractions over the broker interactions and verifying SUT’s interactions with them. This is much less desirable, as it results in a more ‘white box’ test than we would want at a service level.
In Java with Spring Boot, the test could look like this:
Comparing the two implementations above, the impact of the different communication styles on the test code should be apparent. These service tests operate only on message topics. In short — service tests for event-driven systems operate on events for inputs and events for outputs. Another by-product of a decoupled architecture is often an absence of mocks and stubs in the service test code, which stems from inherent ‘unawareness’ the service has of other services.
Service tests for event-driven systems operate on events for inputs and events for outputs and are unaware of other services.
The final level of tests we will discuss here is the end-to-end tests. This is the ultimate ‘it works’ validation, that ensures that the contracts covered in the service tests ‘meet’. In other words — these tests connect the dots between individual services and ensure that what service A needs, is in fact what service B is delivering. These tests typically execute post-deployment, against a test environment. They take longer to run and provide less focus in case of a failure, so they should be treated as the last bastion of defense.
These tests operate from a standpoint resembling that of a real user. In these tests, individual services and the specifics of a messaging platform are abstracted, meaning that rarely, at this level, will be interacting with Kafka directly. Here is what the System Under Test looks like:
Does it mean that at the end-to-end level of testing, the architecture of our system will not impact our test code? Well, it depends. Often, event-driven systems have entirely different semantics from synchronous, point-to-point systems. Let’s take our order processing system for example. In the case of the point-to-point system, the Order Service which orchestrates all the underlying operations may keep the original HTTP request open and return a response only when the order is processed through all of the components. A Java test could look like this:
As you can see, every aspect of an order getting processed hinges on the original request, and as soon as it is complete, the processing has finished. This might look quite different in an event-driven system, where processing happens asynchronously:
You will notice the asynchronous nature of the system manifests itself in the intermediary QUEUED status of the Order, as well as the necessity to check for its updates. This is often a desirable characteristic in a system, increasing its responsiveness, especially if downstream processes may take a longer time to complete, as in an external payment gateway transaction.
To summarize, there is a tax to be paid with testing asynchronous, event-driven systems. This should, however, not discourage you from opting for this kind of architecture, as this tax is a one-time payment, associated with developing new patterns for automated testing and overall system design. In the long run, these systems tend to age better than monolithic or even point-to-point microservice architectures, thanks to their highly decoupled components. Tooling and supporting technologies are still evolving and not as mature as with the traditional approach, however, patterns are starting to emerge.
Thanks to Amber Houle and Jesse Diaz for providing feedback on this article.