AWS Lambda Cold Start — a stomping Elephant on the serverless land

Size is (almost) all that matters for optimizing AWS Lambda cold starts

An AWS Lambda performance benchmark on the cold start duration growth by code size and the impact of other configuration factors.

Adrian Tanasa
11 min readSep 26, 2023

--

If Lambda cold start were an animal in the serverless land, that would be a stomping elephant. Everybody talks about it; many can’t see the world behind it, and its size is the first thing that comes to mind.

While AWS provides the provisioned concurrency as the silver bullet to tackle this problem, it comes with several trade-offs, not the least being its cost. Therefore, understanding the impact of cold starts on latency and the means of optimizing is crucial for software engineers who aim to design AWS Well-Architected serverless workloads.

What is a Lambda cold start?

For anybody unfamiliar with the subject, let’s take a moment to describe the elephant in the room:

A cold start is a stage in the Lambda lifecycle consisting of downloading the code from the source location (S3) and provisioning a new environment for the function with the specified memory, runtime, and configuration.

AWS Lambda lifecycle for the standard provisioning with cold start part highlighted

It happens when no other previously provisioned instance (with the same configuration and source code) is available to handle the request for the Lambda function. According to an analysis of production Lambda workloads by AWS, cold starts typically occur in under 1% of invocations.

“How” size — Lambda cold start questions

The fact that the code size is a significant factor that impacts the cold start duration and one that can be directly controlled by developers is something widely publicized. However, I could not find any benchmark or documentation that goes beyond the “What” statement and into the “How”:

  • How important is code size for cold start duration?
  • How does cold start duration vary with the function’s size?
  • How do other factors (memory, architecture, etc.) impact the cold start duration growth by size?

To address these questions, I am using a hypothesis-driven design approach:

  • Hypothesis — Size is the only significant factor impacting the cold start duration (within the same programming language).
  • Assumption — Creating a new execution environment (for the same configuration) takes the same amount of time, regardless of the code size.

My goal is to be able to project the average cold start duration of an AWS Lambda function based on its size and runtime.

Benchmarking cold start with the Lambda Size Tuning

For measuring and validating this hypothesis, I designed an experiment to evaluate the cold start duration for Lambda functions over an extensive data set on the same runtime and for the code size from 128KB sequential doubled up to 64MB. After measuring the results for the reference scenario, I changed only one configuration factor (such as memory, vpc) and re-run the experiment.

The AWS workload backing this experiment is similar to the AWS Power Tuning, with an API Gateway fronting a Step function that orchestrates the calls in parallel to all the different-sized Lambda functions. The input (wait time between iterations and the number of iterations) is dynamic, allowing me to test multiple scenarios.

AWS Size tuning workload with API Gateway, Step Function and AWS Lambda functions
AWS Lambda Size Tuning

To generate a relevant data set for this experiment, I have run 100 iterations with a wait time of 10 minutes between each iteration. As a result, the cold start rate reached 100 percent. To remove any potential impact of a “high cold start after a code change” on my analysis, I have a CI/CD automation executed after each deployment to call the functions once before running an experiment.

This experiment is primarily a Node.js Benchmark, with data captured in September 2023. I will also use Python for the reference scenario to compare the latency cost (ms/MB) variation between different programming languages.

To control the Lambda functions’ code size, I inject an array of dynamically generated strings (through the Node.js crypto library) up to the desired target value.

// Example of 1KB sized Lambda function
export function handler() {
console.log('[handler] 1K lambda called');
const jsonDoc = ["d967dd020dd09e2e4b88de237198daa1fbb99e1d136dcbba46c6a3bee1bba4546...up..to..1KB..size"];
const randomIndex = Math.floor(Math.random() * jsonDoc.length);
return Promise.resolve({
statusCode: 200,
body: JSON.stringify(jsonDoc[randomIndex]),
});
}

Given that AWS CDK is compressing the source files before uploading them to the S3 deployment bucket, the reported size in the AWS Console will be that of the compressed files and not that of the raw source code that runs on the instance. Given the randomized nature of the source code, the packaged-to-raw size ratio for this experiment (54%) should generally be higher than in real scenarios.

How important is code size for cold start duration?

I am setting the reference scenario with the following configuration for the Lambda functions:

  • Runtime: node.18x
  • Memory: 256MB
  • Architecture: x86_64 (default)
  • VPC: no
  • Region: us-east-1

To aggregate the results of the experiment, I am using the CloudWatch Logs Insights with the following query over the standard REPORT entry (where initDuration reflects the cold start duration):

fields @timestamp, @duration, @billedDuration, @initDuration, coalesce(@initDuration, 0) as initDuration, @maxMemoryUsed/1000000 as memUsed, @memorySize/1000000 as memAlocated, @log, 'nodejs18.x' as runtime
| parse @log /(?<lambdaSize>([0-9]+))k/
| filter @type = "REPORT" and @log like /(?i)(Point)/
| stats count(*) as totalRequests, (sum(@initDuration) / totalWithInitDuration) as avgInitDuration, max(initDuration) as maxInitDuration, min(initDuration) as minInitDuration, count(@initDuration>0) as totalWithInitDuration, (totalWithInitDuration) as coldStartRequests, avg(@duration) as avgWarmDuration, avg(memAlocated) as avgMemAlocated, coalesce((coldStartRequests/count(*) * 100), 0) as coldStartsPercent, pct(initDuration, 90) as p90InitDuration
by lambdaSize, runtime
| sort lambdaSize desc
| display lambdaSize, avgInitDuration, minInitDuration, maxInitDuration, p90InitDuration, coldStartsPercent, totalRequests, avgWarmDuration, avgMemAlocated, runtime

For this Node.js 18 reference scenario, the results show that the duration of cold starts is significantly impacted by size, with average values from 171ms (for 1 KB function) to 3.1 seconds (for 64 MB).

Cold start duration growth by code size — reference scenario Node.js 18 (x— KB, y — ms)

Assuming that the provisioning time for the same configuration is the same no matter the source code size, I projected the cost based on deducting the duration of the execution of a 1KB lambda function and dividing it by the size difference.

ratio = 1024 * (initDurationN - initDuration1K)/(sizeN-1)

The cold start duration to raw code size ratio follows an inverted bell curve with the bottom at 26ms/MB (for 1MB size) and the margins at a top 45ms/MB.

Duration per megabyte cost of increasing code size — Node.js

Increasing memory does not reduce cold start duration

I have increased the memory to 6GB for this case, keeping the rest of the configuration per the reference scenario. The result confirms that memory does not significantly impact the duration of cold starts.

Cold start duration growth by code size — memory impact on Node.js 18

Another experiment I ran for this configuration was to check if increased memory results in a lower percentage of cold starts. Without getting into details, I couldn’t find a significant improvement nor a relation to size when changing the wait time between iterations to 1, 5, and 7 minutes. For the 5-minute interval, the cold start rate is between 6–8% for the 256MB memory compared to 5–8% for the 6GB value.

Using a VPC setup does not affect cold start duration

I have updated the Lambda function for this scenario to have a VPC configuration. The VPC setup has no significant impact on duration at low throughput. In this case, the difference is in the low single digits, and its value is relatively constant with size.

Cold start duration growth by code size — vpc impact on Node.js 18

Changing the runtime version impacts cold start duration

For this case, I want to assess any variation when upgrading a runtime for the same programming language using Node.js 16 instead of Node.js 18. As a result, an improvement is visible, with the initDuration reduced for all the functions in the region of low double digits.

Cold start duration growth by code size — runtime version impact on Node.js

Using ARM architecture reduces cold Start duration for large-sized functions

The result of changing the Lambda architecture to arm64 is surprising, with a significant reduction of the cold start duration for the larger-sized functions. In this case, we can see an improvement of 760ms for the 64MB size.

X86_64 for Lambda functions under 4MB and ARM64 for the ones over is the optimal configuration for reducing cold start duration.

Cold start duration growth by code size — architecture impact on Node.js 18

Enabling provisioned concurrency results in zero synchronous cold starts

I have set the provisioned concurrency to 1 for this scenario, and the results show that 16–18% of all the requests have an initDuration greater than zero. These values are also significantly higher than the ones without provisioned concurrency.

Using AWS X-Ray, I verified that there is zero cold start for all these requests in the sense that it is not impacting the response times (a statement confirmed through AWS Support).

In provisioned concurrency mode, AWS asynchronously reprovisions new instances to replace previous ones without affecting availability. The high value of the initDuration suggests that AWS uses a cost-effective mechanism compared to standard provisioning — in addition to having to create an instance for a second availability zone and executing the initialization code.

An issue introduced by provisioned concurrency is that you cannot differentiate the actual Cold starts from the async Cold starts using the standard Lambda logs from CloudWatch. To solve it, one must look into additional logging (such as AWS Power Tools) or alternative monitoring tools that enhance the captured metrics.

For Node.js runtimes and functions with a raw code size under 16MB, anything with the initDuration over 1 second is likely a provisioned-concurrency asynchronous cold start. The following AWS CloudWatch Insights query — I find extremely useful for provisioned concurrency with auto-scaling — excludes anything over this threshold from reported cold starts:

fields @timestamp, @duration, @billedDuration, @initDuration, coalesce(@initDuration, 0) as initDuration, (@initDuration > 1000) as isLikelyProvisioned, coalesce(isLikelyProvisioned * @initDuration, 0) + @duration as totalSyncDuration, @maxMemoryUsed/1000000 as memUsed, @memorySize/1000000 as memAlocated, @log, 'nodejs18.x' as runtime
| parse @log /(?<lambdaSize>([0-9]+))k/
| filter @type = "REPORT" and @log like /(?i)(Point)/
| stats count(*) as totalRequests, (sum(@initDuration) / totalWithInitDuration) as avgInitDuration, max(initDuration) as maxInitDuration, min(initDuration) as minInitDuration, count(@initDuration > 0) as totalWithInitDuration, (totalWithInitDuration - isLikelyProvisionedCount) as coldStartRequests, avg(@duration) as avgWarmDuration, avg(memAlocated) as avgMemAlocated, coalesce((coldStartRequests/count(*) * 100), 0) as coldStartsPercent, avg(totalSyncDuration) as avgSyncDuration, sum(isLikelyProvisioned) as isLikelyProvisionedCount
by lambdaSize, runtime
| sort avgInitDuration desc
| display lambdaSize, avgInitDuration, minInitDuration, maxInitDuration, coldStartsPercent, totalRequests, avgWarmDuration, avgMemAlocated, runtime, isLikelyProvisionedCount, totalWithInitDuration

Using InlineCode does not reduce cold start duration

For this scenario, I used the AWS-CDK InlineCode construct to inject the function’s source code in the generated CloudFormation template instead of the “Code.fromAsset” S3 backed integration. Given the 1MB hard limit on the size of the CloudFormation template, I am updating only the 1KB, 128KB, and 512KB functions.

While I expected to see an improvement, the results show a minor latency increase instead. Looking into the generated resources, I found that AWS is not keeping the source code plain but compressed with the ZipFile property.

Changing the AWS region can reduce cold start duration

In this case, I have changed the reference scenario region configuration to us-east-2. The improvement is in the order of low double digits for all the functions, with a slight growth with the size increase.

Cold start duration growth by code size — region impact on Node.js 18

Using AWS Lambda layers reduces cold start duration for medium to large-sized functions

To test the impact of layers on the Cold start duration, I have moved the generated data outside the Lambda source code as JSON files in AWS Layers. The AWS Lambda Layers’ compressed size is marginally higher than the reference scenario’s AWS Lambda deployment packages.

// Example of 256KB sized Lambda function with layer
import { APIGatewayProxyResult } from 'aws-lambda';
import jsonDoc from '/opt/generated.json';

export function handler(): Promise<APIGatewayProxyResult> {
console.log('[handler] 256k layer lambda called');
const randomIndex = Math.floor(Math.random() * jsonDoc.length);

return (Promise.resolve({
statusCode: 200,
body: JSON.stringify(jsonDoc[randomIndex]),
}) as unknown) as Promise<APIGatewayProxyResult>;
}

Surprisingly, the results show a significant duration reduction starting from 2MB-sized functions and gradually increasing to a maximum of 2-second savings for the 64MB-sized one.

Cold start duration growth by code size — Lambda layers impact on Node.js 18

Benchmarking Nodejs for the best practical Cold Start performance in 2023

Next, I am bringing together the best-performing factors to assess their cumulative impact: code size, nodejs16.x runtime, us-east-2 region, and alternative X86 and AMR architecture (on the 4MB threshold). I omitted the AWS Layers configuration for this experiment as using it for all your Lambda functions in practice is not feasible.

The result is an improvement of over 30ms compared to the reference scenario — a 17% reduction for the entry-sized Lambda functions.

142 ms is the best cold start duration for Node.js in 2023 (expected average)

Cold start duration growth by code size — best case scenario for Node.js in 2023

Here is a head-to-head diagram showing the impact of various configuration factors on the Cold start duration through the code size lens:

Cold start duration growth by code size — All factors head-to-head
Cold start duration difference compared to the reference scenario (in ms)

Python has an entirely different duration by size growth than Nodejs

To check if the duration-to-size rate is similar between different software language runtimes, I am changing the runtime configuration from the reference scenario to python3.9.

# Example of 1KB sized Python Lambda function
import json
import random

def handler(event, context):
print('[handler] python lambda called');
generatedArray = ["d967dd020dd09e2e4b88de237198daa1fbb99e1d136dcbba46c6a3bee1bba4546...up..to..1KB..size"]
randomString = random.choice(generatedArray);

return {
'statusCode': 200,
'body': json.dumps(randomString)
}

There is a significant benchmark focus on cross-runtime performance, and the expected performance improvement is well reflected in the results.

Cold start duration growth by code size — Python3.9 vs Nodejs18 vs Nodejs optimized

What is more interesting in this case is that there is also a better duration by size growth rate for Python — it peaks at 26.6ms/MB for the 4MB value and then slightly drops with size increase .

Conclusion

The results of this experiment show that code size is one of the most important factors impacting AWS Lambda Cold Start performance, reducing duration by an order of hundreds of milliseconds. In addition, the AWS Region and the Runtime version can boost your performance by several dozen milliseconds.

“Keep an eye on the size of the serverless elephant!”

For Node.js, keeping your source code well under a 1MB threshold is feasible for most cases by keeping a keen eye on your dependencies sizes, minifying, and using package bundlers with dead-code removal (such as Webpack Tree shaking).

Defining organization-level fitness functions to analyze the deployed package size of the AWS Lambda functions and that of the AWS Layers (including external ones) can quickly catch problems, improve performance, reduce cost, and increase confidence in Serverless technology. Teams can also shift-left this issue by using pre-commit scripts and CI/CD automation to warn and potentially block the deployment of functions with the code size over a desired threshold.

One unexpected outcome of this experiment is the impact of AWS Lambda Layers and ARM architecture on the cold start performance of medium-to-large-sized Lambda functions. For Node.js runtimes, this configuration can save seconds for the edge case scenarios where size cannot be optimized under an order of Megabytes.

Finally, this benchmark highlights the importance of covering real-life scenarios (where the serverless functions have somewhat more than a few lines of code) and sets the 2023 cold start performance expectations for AWS serverless workloads with Lambda Node.js and Python runtimes.

--

--

Adrian Tanasa

Cloud Solutions Engineer, Serverless advocate & GenAI practitioner