AWS Performance Testing: Ensuring Smooth Migration to Serverless

Dimcho Karakashev
SSENSE-TECH
Published in
8 min readFeb 24, 2023

Performance testing has been a crucial process at SSENSE for a long time. The platform is often subject to significant traffic variability associated with the release of high-demand products. Performance testing helps us achieve stability and balance costs during such variabilities. In the SSENSETech article Benchmarking Microservices on Kubernetes, published in 2020, we covered a wide range of topics related to practices and tools that can be used to ensure platform reliability. Fast forward to 2022, and many of these are still applicable today.

However, our organization has evolved, and with it, there has been a shift in technology. More specifically, migrating core services to serverless for reasons such as removing a single point of failure, and increased reliability, among others. On the SSENSE platform team, we transitioned one of the core services we maintain to serverless but we found a shortfall in terms of performance testing because all of the tests were done for microservices deployed in Kubernetes. As a team, we asked ourselves: should we do performance testing if AWS says that a particular service has certain Service Level Objectives? If yes, what process should we follow?

In this article, we start by giving a brief overview of the service which is subject to performance testing. Then, we describe how we approached the performance related questions and what we learned. Lastly, we discuss how we can improve moving forward.

Architecture of Service Under Performance Testing

The system subject to re-design, and under performance testing, handles the pubsub communication between microservices. It is a foundational component of the platform within SSENSE that supports the Event-Driven Architecture. A simplified view of the legacy service, based on Kubernetes, can be visualized with the following high-level diagram:

Figure 1: High-level overview of the legacy pubsub architecture. As demonstrated in the diagram, publishers send events to our microservices, which process the message and deliver it to the consumer via SNS topic. Additionally, events are sent to kinesis where they are eventually routed to S3 bucket.

Figure 1 shows the legacy design, a microservice that handles and routes received events to the appropriate consumers. It also stores the event in S3. We have two endpoints supported by this microservice: a legacy endpoint supporting custom message processing due to historical reasons and a new endpoint with simplified message flow. This architecture was able to support the traffic to a certain extent; however, it suffered from being a single point of failure. The revised architecture addressing those problems can be seen in the following diagram:

Figure 2: A high-level overview of the final pubsub architecture, which highlights two designs handling requests. One design for the legacy endpoint, with the additional processing required, and one design for the new endpoint, which contains a simplified message flow.

Figure 2 provides a high-level overview of the new pubsub architecture. It provides a solution to the challenges faced in the legacy system. For example, the architecture supports flexible infrastructure deployment on multiple accounts. As a result, the design removes the single point of failure and issues can be contained within an AWS account.

Performance Testing Preparation

After we completed our initial implementation of the new pubsub architecture, we deemed it necessary to do a performance test. Such testing would help ensure that AWS service quotas are aligned and lambdas are optimized and configured to handle traffic bursts. For example, by default API Gateway supports 10,000 requests per second per account, while Eventbridge supports between 400 and 10,000 requests per second, depending on the region. If not addressed, this misalignment in quotas could lower the supported traffic of the whole system.

Before starting the optimization process, we took the time to design the workload. More specifically, what we were asking our system to do. Our goal was to thoroughly understand the usage of the system, such as the extent of each endpoint’s use. We relied on historical data for design comparison and evaluated the difference using metrics like latency and client/server error rates. Once we completed the analysis of the historical data, we created benchmark scripts to compare the system performance.

Asking the right questions and defining the workload correctly was key as misalignment could lead to over-optimization or focusing on improving the wrong parts of the system. For instance, simulating a sudden spike in traffic is vastly different than an increase over a few minutes. Optimizing for the former could result in a significant cost increase as it might require setting expensive configurations such as provision concurrency, which might not be necessary in practice.

Performance Testing of Legacy Endpoint

To optimize the legacy endpoint design — the one depicted in Figure 2 that involved multiple lambdas — we deemed it would be crucial to run a performance test. Lambdas introduced variability that made it impossible to rely solely on quotas for understanding the most optimal configuration. We had to ensure that the lambdas would run with optimal memory and CPU.

We started by optimizing the lambdas using AWS Lambda Power Tuning. It is essential to use a payload that closely resembles production data because the tool makes real downstream calls to services, even though you are passing mock data. In other words, the workload should mimic, as closely as possible, the actual behavior of the system under test in order to get meaningful results. We varied the number of executions and executed them in parallel to understand the lambdas’ behavior, taking into account the downstream AWS services. We were able to set parallel execution because each function execution was entirely independent of the other. [The benefits we gained from leveraging parallel execution can be aptly summarized in this Github thread]. Using AWS Lambda Power Tuning allowed us to set proper memory and choose the best cost with a small invocation time. The graph below shows the results for 20 parallel executions, revealing that 1 GB of memory was a good fit for this lambda.

Figure 3: Example results of running AWS Powering Tuning tool for one of our lambdas.

After optimizing the lambda memory and CPU, we used our knowledge of Kubernetes practices to implement a similar process for serverless. In Kubernetes, we focus on setting effective Horizontal Pod Autoscaling (HPA) policies. For instance, we need to configure a minimum number of pods that would allow already available pods to handle the influx of traffic. Without such practice, new pods will take time to spin up, and that could result in errors upstream and potential incidents. Lambda provisioned concurrency serves a similar purpose by spinning up lambda instances to be readily available to handle traffic spikes.

Fortunately, by using serverless, we didn’t have to do multiple performance test runs to figure out the HPA triggers for scaling with memory and CPU, (as described in our previous article on Kubernetes), as that is handled by AWS. If all the available lambda functions are processing requests, AWS automatically spins up new instances. These new instances suffer from what is called cold starts and add a significant amount of execution time. But since our service is called asynchronously, the additional time was not a huge concern. The biggest concern was setting the minimum provisioned concurrency so that spikes in traffic do not cause the lambda to exceed the AWS burst quota.

Another similarity with Kubernetes and Serverless is that we configured our service to prevent over-utilization of resources. With Kubernetes, you set a maximum number of pods in the HPA policy to ensure that space in your nodes is not used up by a single service. In a similar fashion, lambda functions can be configured with reserved concurrency to ensure that account resources are not overtaken by a single service. If Reserved Concurrency is not set, the lambda can consume all account concurrency, causing other lambdas without reserved concurrency to be unable to scale accordingly, leading to throttling across the account.

After using AWS Power Tuning to optimize our lambdas, and reviewing the historical data from our legacy architecture, we estimated the required quotas for each AWS Service, such as API Gateway, Eventbridge, SNS, etc. We opened AWS support tickets to increase quotas, and after that we executed performance tests. During the tests, we faced a few challenges.

The first challenge we faced was related to Kinesis throughput. Although we increased the throughput quota, our Cloudformation Stack had to be destroyed and recreated, which led to losing the limit for Kinesis without realizing it. Facing this issue helped us uncover an unhandled exception caused by Kinesis throttling, leading to significant cold starts, as shown in Figure 4.

Since the reserved concurrency was not set at the time, the lambda used almost all of the account’s available concurrency. Our performance test helped us prevent a potential incident. To address that, we not only ensured the reserved concurrency is set, but we also modified our design by adding an SQS queue to manage the ingestion rate. The modification was possible since the process could tolerate the added latency. Lastly, it is important to mention that running multiple runs and experimenting with provisioned concurrency incurred a substantial cost.

Figure 4: Example of the impact of the unhandled throttling exception.

Performance Configuration for New Endpoint

The second part of the system used the same AWS services — API Gateway, Eventbridge, SNS, Kinesis — as the first part but without using lambdas. Designs without lambdas can be accomplished by leveraging the built-in AWS functionality as much as possible, as described in the article Mapping Templates: Transforming Your Payloads Within AWS API Gateway. We thoroughly reviewed the quotas for each AWS service and monitored the performance of the new pubsub implementation when other services interacted with the new system implementation during performance testing. Based on the expected traffic, we deemed that we were not going to exceed the default account quotas so we decided it wasn’t necessary to do performance testing. Keeping track of each limit can help identify future quota changes due to increased traffic. While skipping the performance test helped us save on costs and time, we would have performed testing if there was any doubt about a quota or misconfiguration.

Conclusion

This article explored how we approached performance testing after the redesign of a service moving from a Kubernetes cluster to AWS serverless. For the design that involved lambdas, we used a power tuning tool to ensure optimal memory usage to balance cost and execution time.

We performed multiple tests, which enabled us to find issues such as missing exception handling due to kinesis throttling. This underscores the importance of stress testing the system even if your solution is based on managed and serverless services.

For the parts of the solution that did not involve lambda functions, it was possible to rely on our traffic expectations and review AWS documentation to determine the required quotas.

If possible, try to conduct stress tests to uncover potentially overlooked aspects and make sure your application reacts accordingly.

Editorial reviews by Catherine Heim & Mario Bittencourt

Want to work with us? Click here to see all open positions at SSENSE!

--

--