Load Testing Serverless / Asynchronous Applications: Practical Considerations

Mario Bittencourt
SSENSE-TECH
Published in
7 min readJul 19, 2024

Load testing is a common practice we follow at SSENSE to regularly assess how well-prepared we are to sustain the ever-increasing demands we face. While it can be fairly straightforward for smaller and traditionally hosted applications, things may not be so clear when we have distributed components that leverage asynchronous applications, including those backed by serverless technologies.

In this article, I will discuss some of the nuances of these technologies that should be considered when planning load tests.

Load Testing 101

In today’s world, it is rare to find a capability that is offered end-to-end by a single application. The norm is that you have a set of connected services with a mix of orchestration and choreography that gets involved to deliver the functionality our users need.

Figure 1. Business capabilities are served by leveraging a collection of applications.

If that functionality is exposed to the public internet, you have to assume that the traffic you face will fluctuate over time, perhaps in a predictable way like at specific hours of the day, or due to (un)intentional actions that go viral.

Figure 2. Spike in demand for your application over time.

With those two things in mind, your next responsibility is to determine if your system will be ready if/when the spike occurs. This depends on your system’s performance, elasticity, and scalability characteristics.

As the number of services involved grows and differs on those architectural characteristics, there is only so much you can confirm by purely modeling. You should put your system under a load that mimics what you expect and then some.

Figure 3. A traditional approach to load testing your application.

In its simplest form, you can look at your public interfaces and based on projections, send traffic to those interfaces and measure how the system behaves. You can then compare it with the SLAs you have for performance and provide the answer, giving additional context to indicate at which scale you would likely start missing on those SLAs.

That information can help determine, ahead of time, if you need to make changes to the architecture and implementation to cope with this potential traffic.

Challenges

While the idea of the load test is simple, its execution comes with a set of challenges. For example, you likely do not want to load test your production environment, as this could generate a lot of test data in your persistence, which would have to be filtered/removed later to save on costs and not skew reports. Additionally, it could hinder the availability of your system for real users, compromising the SLAs and revenue.

The first challenge is that you must provide a separate environment that is architecturally the same as the production one. This means making sure all services and their connections are deployed, mimicking production, at least during load tests. Anyone who has done this knows that it’s not an easy feat, even if you have adopted infrastructure as code (IaC) to automate this task.

Figure 4. Production and Test environments need to match or be as close as possible for the load test.

The next challenge is guaranteeing that data dependencies are met. Your load test will have to simulate real traffic, which means that it will likely require pre-existing data to support the execution of the tests. While this is easier for read-only and isolated tests, complications can arise when your service mutates the system’s state and/or has dependencies on the state of other services.

Figure 5. Certain scenarios depend on previous successes on the same application.

Asynchronous Twist

Once one of the functionalities we want to provide is backed by an asynchronous component, our existing testing model no longer works in the same fashion.

Figure 6. Mixed process with synchronous and asynchronous components.

In this example, we see that the application is not aware of the asynchronous components, in a classic fire-and-forget mode. This means that simply looking at the load testing results we would conclude that the application performs adequately without exposing potential struggles downstream

Figure 7. What if the asynchronous part can't handle the influx rate?

This happens because there is no direct connection between the test done on the synchronous side and the ramifications of these executions on the asynchronous side.

But how can we perform my load test and still have meaningful results? Let’s adjust our approach.

The performance metric (RPS/latency) is kept only for the synchronous part as it is still the user-facing aspect and needs to be protected. Any asynchronous execution is measured not by the RPS but by the execution time from the moment the event/command is placed in the messaging infrastructure, to the moment it finishes the execution.

Figure 8. We measure the time it takes from dropping the message at the queue and it being processed.

Because it is asynchronous, you can receive a huge number of messages that would simply pile up, waiting to be processed. By focusing on the execution time from the moment a message is available, we are measuring our capability to consume and act on those messages.

This way, we factor in any downstream dependencies that need to have their concurrency controlled to avoid overloading.

If this number goes beyond a certain threshold defined by the business, it means that while we are capable of receiving those requests, we will not be able to maintain certain customer promises.

Another point to highlight is that you have the opportunity to test from the asynchronous part of your process.

Figure 9. Sometimes you can load test directly the asynchronous parts of your process.

If the inputs are self-contained, this allows you to simplify the test process and load test the sync and async parts separately.

Cloud/Serverless Means Infinite Scale, Right?

Well, not so fast!

When first introduced, the elastic nature of the infrastructure provided by the cloud, and more recently serverless, led to a false interpretation of what it could mean for the scalability of our applications.

While the promise is that it can scale by offering resources we can use at a fast pace, it does not automatically make our application capable of linearly scaling with them.

Cloud providers also impose quotas on how many and/or how fast they can scale the resources you request. This means that you may be prevented from even scaling or seeing a delay on those new resources if a sudden spike takes place.

So, serverless-backed applications also deserve to be load-tested, albeit with some particularities.

Your focus should be less on whether serverless can scale or not. Instead, the idea is to confirm that it can scale fast enough and if this will have any unintended consequences.

Your load tests would then uncover and stress the cloud account limits. If you hit a quota, you will see that resource being throttled which will manifest itself by reducing the application’s performance or, if unhandled, causing it to fail.

Figure 10. Understanding the resource scaling capabilities/limitations is key.

Maximizing Load Testing Efficiency

Load testing should be included as part of your regular prep. Although many application workloads have seasonal patterns, it is important to consider how your application handles sudden bursts.

Doing so will allow you to proactively identify problems that go unnoticed under regular loads.

As we explored in this article, when it comes to serverless and asynchronous applications, two aspects differentiate them from a load-testing perspective:

  1. Simply measuring the response time from the synchronous part is not enough, as it does not provide the full picture.
  2. While serverless applications can have their resources scaled massively, your architecture and cloud provider quotas can affect how effectively you can use these resources.

Asynchronous parts usually face retries due to errors or as a response to downstream concurrency limitations, the direct latency response does not apply. Instead, measuring the time from when the message is put in the infrastructure to when execution is completed offers a better metric.

If your workload uses any serverless component, focus on finding how the limits that the cloud provider sets affect your application and others on the same account.

By regularly conducting these load tests, you will find yourself better prepared for any changes they may uncover.

Happy Testing!

Editorial reviews by Catherine Heim & Sam-Nicolai Johnston.

Want to work with us? Click here to see all open positions at SSENSE!

--

--