Bridge Integrity — Integration Testing Strategy for EventBridge Based Serverless Architectures

Sarah Hamilton
Feb 23 · 7 min read

Introduction

We’ve written before about the significance of Amazon EventBridge and its use in modern Serverless Architectures. Event-Driven architectures, using EventBridge, allow us to avoid the pitfall of the “distributed monolith” and create scalable, loosely coupled, systems. There is however a challenge… testing these architectures.

For the purpose of this article we are going to consider an Event-Driven microservice based architecture that is using EventBridge as the only communication mechanism between microservices. The architecture design has come out of a DDD workshop like EventBridge Storming. We’re going to focus on how we can use an integration testing approach to ensure Event communications between services are tested.

A generic scenario to test…

A diagram of 2 services which are connected via EventBridge. The first service creates and order and the second service writes and invoice to S3.
A diagram of 2 services which are connected via EventBridge. The first service creates and order and the second service writes and invoice to S3.
EventBridge makes it easy to create loosely coupled event driven architectures because the services don’t need to be aware of each other

Let’s zoom in on an interaction between 2 services, where the services are decoupled and linked via EventBridge. The Order Service does some processing within the Lambda function — as a result an event is sent to EventBridge. The event is received and triggers a Lambda function in the payments service where a PDF invoice is created and stored in S3 — we’ll base our testing around this generic example.

Testing Strategies

Whilst this article focuses on integration testing EventBridge, it’s important to note that a mix of end-to-end, integration and unit tests are essential to have confidence in your architecture.

COMMENT

An E2E test could encapsulate the entire flow of our example — testing that when an order is created, an invoice is stored in the S3 bucket. E2E tests give confidence that the entire flow is functional and that the correct outcome is achieved. However, it is tricky to find the root cause of an error when an E2E test fails, as the error could be occurring at any part of the flow. This is where testing parts of the flow becomes important — integration tests.

A diagram showing the flow split into 2 integration tests. The Order Service is tested by the first test and the Payments service is tested by the second
A diagram showing the flow split into 2 integration tests. The Order Service is tested by the first test and the Payments service is tested by the second

When using integration tests, we check that the desired outcome of the individual service is obtained. In the example we split this into 2 integration tests. If an error occurs, we’ll be able to diagnose which service the error is happening in!

Integration test 1 is testing the processing of the CreateOrder function and asserting that the event was fired. Integration test 2 is asserting that when an event is received, the PDF has been stored in S3.

The Need for Ephemeral Test Stacks

Serverless architectures are best tested using integration testing and E2E testing on the deployed infrastructure. This ensures that we are testing a real environment which is as close to the production environment as possible. Mocking is rarely appropriate, unless you want to test for failure such as service outages — though in this case the new fully managed chaos engineering service being released in 2021 can help. Instead, real scenarios should run on the real underlying services.

The pay-per-use pricing model and rapid deployability of Serverless services makes spinning up temporary testing environments on a per pull request basis fast and cheap. At the beginning of the CI/CD process a temporary stack is deployed and used to test on. We covered the details of the need for ephemeral stacks in our previous Serverless Flow article. We keep our stack’s naming unique by using the pull request’s number or branch name in the stack name.

A note on stack limits: AWS limits the number of environments that can be open at any one time, so we need a teardown process at the end of our testing. This can be achieved by a Lambda cron, automatic cleanup on merge to main or a lightweight stack management API that allows stacks to be claimed, released and even reused.

The CI process from opening a pull request to tearing down the ephemeral stack.
The CI process from opening a pull request to tearing down the ephemeral stack.
Serverless Flow — The CI process from opening a pull request to tearing down the ephemeral stack.

We use Jest as our test runner, a feature-rich JavaScript test runner, combined with the AWS SDK to trigger events and make assertions.

Assert the event was fired

Diagram to show we are focussing on the first integration test
Diagram to show we are focussing on the first integration test

For the first integration test we test the invocation of the OrderCreated lambda, its business logic, and subsequently the event being fired.

In the first integration test we need to assert that the event has been fired. Our approach is to add an EventBridge rule to the custom event bus that is being used by the service— our “Event Interceptor”. This intercepts all events and sends them to an SQS queue. This interceptor can either be set up via direct calls to the AWS SDK in a beforeAll of Jest or you can use infrastructure as code (IaC) with a toggle for non-production environments. This is a feature of sls-test-tools but we have omitted it here for simplicity. Inside our integration tests we can then use the AWS SDK to receive the event via long polling with Amazon SQS.

To get set up we need to configure our test environment with our AWS credentials — we use the AWS SDK to do this. Our ephemeral stack will have a number of dynamic parameters that change depending on the AWS region we are testing on, the AWS profile we are operating as and the CloudFormation stack under test. These could come from environment variables, but we find a better developer experience by changing these on the command line. Therefore we do a basic extension of Jest’s arguments using process.argv and some basic string splitting. On CI these are populated by a combination of branch name and environment variables.

This snippet was taken from our sls-test-tools which we’ll be releasing soon… but we’ve included basic versions of all the code needed to make this approach work independently.

We now trigger the first Lambda function and long poll on the SQS queue to assert that the event has been fired.

  • The waitTimeSeconds being defined as > 0s defines the difference between long and short polling

Now, to make our assertions we’ve added “custom matchers” to the jest expect functionality. These, along with a number of other assertion helpers for Serverless integration testing, are included in the sls-test-tools library we use internally and will be releasing soon. We’ve included a simplified snippet of how these can be configured in jest below.

Assert Received

Diagram to show that we are now focussing on the second integration test
Diagram to show that we are now focussing on the second integration test

For the second integration test we inject an event via EventBridge using the following snippet of code.

Side note: Some of this boilerplate functionality to trigger events to EventBridge, as well as pre-built assertions and spy mechanisms will be released as part of sls-test-tools which is coming soon…

For the second integration test we check that when the event is fired, the Lambda function in the payments service receives the event and write the PDF to S3.

Let’s put an event onto the event bus which should trigger our second lambda to add a file to S3. Now, eventually consistent results are challenging to say the least. For simplicity we will use a sleep function to ensure the process has had enough time to successfully trigger then lambda and write to S3. This isn’t the most elegant way — we typically prefer to retry a number of times before accepting failure, but will keep a sleep here for simplicity.

In our case we check that the content type of the file written to s3 is of the appropriate type — this makes up the 2nd integration test.

Conclusion

In this blog post we have discussed how we use integration testing to have confidence in our event-driven architectures. We base our testing strategy around:

  • Using short lived test stacks to match the production environment
  • Combining Jest with the AWS SDK to run tests and make assertions
  • Using Jest custom matchers to create readable full stack Serverless assertions
  • Creation of an event interceptor rule on EventBridge to forward events to SQS via infrastructure as code or direct calls to the AWS SDK
  • Asserting that events have been fired via long polling on the SQS queue
  • Asserting the correct behaviour when events are received through event injection via the AWS SDK

As we undergo a paradigm shift to event-driven architectures, it’s important that our testing approach keeps up with new evolving technologies such as Amazon EventBridge.

Serverless Transformation is a Theodo initiative. Theodo helps companies leverage Serverless through expert delivery and training teams.
Learn more about Theodo and speak to one of our teams today!

If you like content like this consider subscribing to our weekly newsletter!

Serverless Transformation

Serverless Tools, Techniques, and Case Studies