Testing Event-Driven Applications Using EventBridge — A Four-Year Journey

Steve Morland
9 min readAug 4, 2023

--

Summary

In 2019 Amazon Web Services released Amazon EventBridge, a fully managed, serverless event router. Built on CloudWatch Events, EventBridge offers more features and usability for application developers.

In late 2019 / early 2020, I was fortunate to be working on a project where we could adopt this technology over more established serverless options such as SQS and SNS for our event routing.

Since then EventBridge has become a great tool for helping to build distributed event-driven applications. It’s both simple and powerful and is an AWS service that is seeing a number of exciting feature enhancements (Schedules and Pipes) delivered to it in the last four years.

Our Example Service

This article’ll illustrate some points with a fairly standard service pattern.

Our example application

Above, you can see a Producer Service, an API that uses a Lambda Function to write data to a DynamoDB table. From there, the DynamoDB Change Stream transports the data written to the table to a Lambda function, which processes the data and puts an event on the shared Amazon EventBridge instance.

The “envelope” of the event includes a “source” and “detail-type” that matches the Event Rule in our Consumer Service; the rule invokes the Lambda function that writes the event to a separate table.

Note: This example service is just that and has no additional services that would improve its stances on resiliency and redundancy.

Producers and Consumers

When testing applications that use Event-Driven Architecture, testing the Consumer part is much easier to implement; an application reacting to the event asynchronously shares many of the testing characteristics of synchronous operations.

Our consumer service

From sending an event to the bus, we would then look to see if the event had the desired effect on the downstream services, such as storing an entry in a DynamoDB table or a file in an S3 bucket.

Our producer service

Producers are generally a mix of synchronous and asynchronous behaviour; being able to test the end-to-end of the service is much more complex.

Jest test suite interacting with our API Gateway

For instance, illustrated above, if we are using a Jest test suite, we would first make an API request to our API Gateway instance; we can synchronously test the response is what is expected, the result of our business logic.

Jest test suite checking the data store for the correct data

We can also test that the data store reflects the correct data.

Jest test suite’s requirement to test the produced event to give end-to-end validation

As we move out of the synchronous band and into the asynchronous band of behaviour, we start to have to test out of the bounds of our service. The event is sent, but to validate it in our test suite, we need to capture it.

Note: the illustrated services are rudimentary, and you will probably end up with services that are both consumers and producers. They will need to be tested with both aspects accounted for.

A note on ephemeral environments

Using ephemeral or disposable environments to run tests against is one of the great benefits of Serverless technology. The speed, ease, and cost of creating a like-for-like copy of the production infrastructure is unparalleled.

For Leighton, we use a technique based on the branch name, such as feature/ft-101when opening a pull request. Splitting out feature codes and using this as the namespace for our Serverless, CDK, or CloudFormation stack.

The same feature code is injected into the test suite to make it transferable and targeted to any deployed infrastructure.

Capturing events… test infrastructure

Currently, to test events, you will need some test infrastructure, mainly an EventBridge instance you can attach your service to send and receive events and some infrastructure to capture the events and make them visible to your test suite.

For the EventBridge instance, you will have to make decisions about using the default instance, creating a custom instance, and, if you do, where that instance lives.

For the rest of the infrastructure, you have a number of choices; the three that come to mind are;

  • Event Rule and SQS
  • Event Rule, Lambda and DynamoDB Table
  • Event Rule, Lambda(s), DynamoDb Table, and an API Gateway Websocket

As you can see, these grow in infrastructure complexity.

Event capture with Event Rule and SQS

Event Rule and SQS — The minimal amount of infrastructure comes with a trade-off of using an SQS instance which can take a while to deploy, and as SQS is a distributed queue, you may have to poll the SQS endpoint a number of times to collect any captured events.

Event Capture with DynamoDb and Lambda

Event Rule, Lambda, and DynamoDB Table — In a more complex architecture, events are stored in a table as they are captured, with the Jest Suite checking directly in the table. This may not be as scalable as other solutions as you will need to decide on a strategy for storing and accessing the data in the table, paying attention to making entries unique to stop false positives.

Event Capture with Lambda, Dynamo, and API Gateway Websocket

Event Rule, Lambda(s), DynamoDb Table, and an API Gateway WebSocket — This more complex architecture can instantly deliver to the Jest client through the WebSocket connection.

A shared bus is often a noisy bus

Sharing a bus between services is often a noisy affair; using multiple ephemeral environments targeting the same bus, you will have to think of strategies that work for you that name-space the events.

In the past, when we initially came up against this, we name-spaced the event envelope with the build stage name.

{
"version": "0",
"id": "0d079340-135a-c8c6-95c2-41fb8f496c53",
"detail-type": "order.created",
"source": "myapp.orders.ft-101", // <<< `myapp.orders.${STAGE}`
"account": "123451235123",
"time": "2022-02-01T18:41:53Z",
"region": "us-west-1",
"detail": {...}
}

Here you can see we altered the source to include the stage name, which is predictable but, looking back, unnecessary.

There are many ways you could achieve the event namespace; though I was never a fan of this, it did achieve a positive end result.

From Serverless Framework to CDK, bye to shared buses

During the last three years, Leighton has also migrated to using CDK as our infrastructure as a code tool of choice. With CDK being programmatic in how it constructs CloudFormation, we can include a dedicated EventBridge instance in our build at the stages we want to run automated tests against the service.

Each service build has a dedicated EventBridge instance. This removed our need to name-space events and reduced the chance of error that it introduced.

Event standardisations

As EDA continues to gain traction, we, as users, are finding the need to develop standardisation. There are a number of tools out there that add great value to the development of Event-Driven Applications. Investing in your events to set them up correctly will pay off in the long run.

Async API — https://www.asyncapi.com/en

A number of tools and specifications to help you build events that will scale and adapt to the business.

Sheen Brisals The power of Amazon EventBridge is in its detail

This a great read for anyone embarking on designing events that get the most out of EventBridge. From this, we adopted the standard schema we work with for our events.

EventBridge Atlas — https://eventbridge-atlas.netlify.app/

The best tool to catalogue, visualise and share events across your organisation, we share our Atlas through NPM using the schemas to validate events that are captured in the Jest suite. It is especially useful for large organisations with a distributed team topology.

Test tooling

SLS Test Tools — https://github.com/aleios-cloud/sls-test-tools

sls-test-tools provides a range of utilities, setup, teardown, and assertions to make writing effective and high-quality integration tests for Serverless Architectures on AWS easier.

A great set of tools to build test suites, their EventBridge tools use the “Event Rule and SQS” method mentioned above but are deployed via the JS SDK.

Leighton’s EventNet Logo

EventNet — Leighton’s internal event testing tools

Soon to be released to the public, we’ve been using these tools to help capture and analyse events. This includes;

A CDK construct using the “WebSocket” method mentioned above. Which builds a repeatable WebSocket interface attached to your desired bus.

A JS Client for handling the collection of events from the Websocket

A custom Jest assertion for testing against the published schema, we publish our events schemas in a shared NPM repo. This tool validates the events against the latest published schema. Eliminating any drift from the published specification.

Example JSON schema
Example Event that validates with the previous schema

JSON schema is powerful specifications for the shape and make-up of an object; we’ve been using them with API Gateway to validate request bodies for quite some time; it offers a powerful way to express the shape of an event and is much more robust than just strong typing. I think validating Async APIs in a similar way can’t be a bad thing.

The EventNet Web Client Displays a captured event

A Web Client for viewing, saving, and copying events is useful for some developers to visualise the events without using lots of unnecessary logging.

I’ll share more information about EventNet when we release it in the next few weeks.

Note: As we use SLS Test Tools quite heavily, this has been designed to work with that library in a complementary way.

An Example of Test and Validation — Leaning into JSON Schemas

All of this comes together into an example test.

We’ve set up our Jest suite using an NPM-distributed Event Catalogue, from which we can pull the latest agreed schemas.

We use the EventNet client to connect to the WebSocket, making an API Request with Axios; we can then await a match on the produced events.

We can check the keys in the synchronous response from API Gateway and also check whether the dynamoDb database matches the PK and SK we expect (via the SLS test tool’s jest assertions).

We can also check that EventNet caught an event, matches the schema, and that certain keys match the expected value.

Finally, closing the client and achieving a test of the data flowing through the full application.

Note: This example is not testing every key in the objects; for full coverage, you would want to test all the values in the object.

Round Up

  1. EventBridge is very simple to use and powerful; as such, you can make mistakes early on that will affect how you use it as your business adopts it more and more. Invest in it heavily, and it will make your life easier.
  2. Invest in designing your events to be scalable and suitable for the long term.
  3. Invest in tooling to set the events up to be useful in your organisation; making the events visible to users/developers will help them to succeed.
  4. There are tools out there to help you; if they don't fit your needs, build your own.
  5. Use a standard event spec and JSON Schemas to validate your events.
  6. Go and build. Event-Driven Architecture and the underlying tools are exciting to work with; adopting them can transform your organisation.

--

--