Cutting the Serverless Roundtrip Time

Published in

CyberArk Engineering

6 min readMar 14, 2023

Have you ever experienced a slow API request that demanded you take action?

As an architect developing an edge technology serverless web application on AWS, I experienced this. Fortunately, after my colleague and I investigated the root cause of the poor performance, and ended up reducing the response time by 7.2 seconds.

In this blog, I’ll walk you through the steps we took to improve performance so that you can apply them to your project.

Warning! A Very Slow Starting Point with API calls

Our journey starts with the most straightforward measurement — browser developer tools.

After examining the network tab, we noticed the roundtrip time for an API call is 8.5 seconds!

For web application users and developers, this is an eternity. Upon further investigation, the first thing that caught our eyes was that a preflight HTTP call preceded all the regular HTTP calls.

Learning about it was our first task.

What is a preflight and why should you examine it?

I’m not proud to admit it, but I’ve seen the preflight request several times and never carefully examined it.

A preflight request is a cross-origin resource sharing (CORS) request that checks to see if the CORS protocol is understood and a server is aware using specific methods and headers. The preflight is not sent for specific HTTP methods and content types defined as simple requests (e.g., GET application/text content type).

Since our application/json content type is not a simple request, we had to implement a response to the OPTIONS method in our Amazon API Gateway for each URL. The response listed the allowed origins using the Access-Control-Allow-Origin header.

To avoid sending the preflight before each subsequent API call, return the HTTP header Access-Control-Max-Age in the OPTIONS response from the API Gateway. This header tells the web browser to save the response in the cache for the specified time.

Caching the preflight response saved us 0.7 seconds.

After resolving this minor issue, we moved on to the major one: a backend roundtrip that takes 7 seconds.

Leading Backend Flow

Many of our backend flows go through a data access layer (DAL).

Here is the process that we used:

Amazon API Gateway as the API layer for our backend services
Synchronous invocation of two AWS Lambdas
Access the database

Here is a diagram that shows the backend flow that we analyzed and improved:

This complex flow requires sophisticated latency analysis tools like AWS X-Ray.

Analyze with AWS X-Ray

AWS X-Ray is the distributed latency analysis tracing tool we use to provide a complete view of requests as they travel through your application and services. We used it to analyze the timings of all the involved services and components, hoping to find the bottlenecks. We succeeded in uncovering many of them.

Cache Authorization Everywhere

We use Amazon Cognito as an authentication service. On sign-in, Amazon Cognito generates a JSON Web Token (JWT) signed by an asymmetric encryption private key, which is then sent to the browser as a cookie and attached to any further request.

We verify the JWT at two points of the flow:

Lambda authorizer is a mechanism of API Gateway that invokes an AWS Lambda before each API call. Its built-in caching feature lets you define a caching key like a cookie and determine how long to cache.
Enabling it saved us a valuable 0.4 seconds.
The data access layer’s inner lambda pulled the Amazon Cognito user pool’s asymmetric encryption public key on each request to verify the JWT. Instead, we cached the key in Amazon DynamoDB, a fully managed, serverless, key-value NoSQL database. When the encryption public key changes occasionally (rarely), the JWT verification fails, and we get the new public key from Amazon Cognito and store it in our Amazon DynamoDB table. You can also cache the public key on the AWS Lambda handler level and execute the API call to Amazon Cognito only once for each execution environment.
Caching it saved us 0.3 seconds.

Although caching is essential for lowering latency, the most significant performance challenge with AWS Lambda is the infamous AWS Lambda cold start.

A Freezing AWS Lambda Cold Start

AWS Lambda cold start occurs when the invocation request is assigned to a new execution environment.

Executing the code on a new execution environment happens in several stages:

Download your code
Start a new execution environment
Execute initialization code
Execute handler code

Avoiding the first two stages is the best way to deal with their performance impact. The standard way to prevent cold starts is with the AWS Lambda provisioned concurrency configuration. It lets you define the minimum number of warm execution environments at any given moment. The alternative for certain simple use cases like ours is creating your warmer, such as lambda-warmer. The warmer periodically invokes your lambdas to keep them warm. Using your warmer is much cheaper but doesn’t ensure the availability of a minimal number of parallel execution environments, unlike the official provisioned concurrency feature.

To speed up the third stage of new environment provisioning — the execution of the initialization code — we used the best practice of creating service connections globally whenever possible. In our case, we set Amazon Cognito and Amazon DynamoDB clients as global python variables creating it only at the cold start rather than on each invocation.

Following those upgrades, most invocations were warm and saved both lambdas in our flow for up to 3.5 seconds.

Finally, after improving our software, let’s tune our hardware.

The Power of Tuning an AWS Lambda

AWS Lambda Power Tuning is a tool that helps find a Lambda’s optimal configuration in cost and performance aspects.

When creating a new Lambda runtime, you must configure the memory allocation, which also determines the CPU time allocation. The pricing model of AWS Lambda is based on the requested memory allocation and the execution time of each invocation. In CPU-consuming use cases, increasing the memory also increases the CPU allocation, reducing execution time. In such cases, latency may decrease with only a minor impact on cost.

Additionally, you may choose to use AWS Graviton Processor rather than the standard Intel X86 processor. AWS Graviton is an ARM-based processor designed by AWS to deliver the best price-performance for your cloud workloads. In some use cases, it performs better at a lower price.

But how can you know which configuration best fits your Lambda function? That’s precisely what AWS Lambda Power Tuning does. It runs your Lambda function several times using different settings and shows you the cost-optimization graph.

Here’s how to use AWS Lambda Power Tuning:

Install from AWS Serverless Application Repository, deploying an AWS CloudFormation stack.
Execute the step function with the following parameters:
1. Target lambda ARN
2. Target lambda payload
3. Memory options [128, 256, 512, …]

The state machine displays the average cost and speed of each power configuration:

Increasing the memory from 128MB to 512MB reduced the execution time from 2.3 seconds to 0.6 seconds while only incurring a 25% cost increase.

The tuning reduced the execution time of both Lambdas by another 2.3 seconds.

Don’t Miss Out on Improving Your Own Performance

This blog showed how steps like performance analysis, caching, Lambda warming, and runtime tuning could significantly improve roundtrip time.

Personally, I was surprised by the significant improvement. By researching and analyzing your specific use case and technology, you can reveal numerous opportunities for performance improvements, and I hope this post will inspire you to do so.

Got other performance improvement ideas? Please share them in the comments below.

Thanks to Shai Dvash, who investigated the performance issue with me and contributed his knowledge to make our APIs roundtrip faster.