The 5 Ways We Reduce Lambda Cold Starts At PostNL

Published in

PostNL Engineering

9 min readApr 28, 2022

PostNL has a large application landscape. Anything from distribution to sorting, from customer apps to integrations with partners is controlled through a complex environment of AWS accounts, microservices, event-driven integrations, permissions, and monitoring tools. Most of our applications are built using AWS serverless technology: API Gateways, Step Functions State Machines, S3 Buckets, DynamoDB Tables, and of course, thousands of Lambda Functions.

Some of these functions are invoked billions of times each month. At this scale, Lambda performance characteristics become very relevant. One of Lambda’s most significant performance characteristics is its cold starts. In this article, we will cover our view on cold starts and the top 5 optimizations we take to reduce them.

What are Lambda cold starts

Lambda Functions run in execution environments. These micro-VMs are spun up as Lambda Functions are invoked. When the Lambda Service provisions an execution environment, it downloads the function code and runs its initialization code. When these steps have completed, it runs your function’s handler code. This relatively slow process is called a cold start. When the handler code has completed, the execution environment is kept around to serve additional function invocations. When the next invocation reuses an existing execution environment, the code download and initialization steps are skipped, and the Lambda Service immediately executes the handler function. This process is much faster and is called a warm start. A detailed description of the Lambda initialization procedure can be found on the AWS Blog Operating Lambda: Performance optimization — Part 1 by James Beswick.

When to optimize cold starts

There are a number of actions and optimizations developers can take to reduce the cold start duration. However, the first question is should you?

In a production environment with a relatively stable and constant workload, about 1% of all Lambda executions are cold starts (source). If these Lambda Functions are used for asynchronous or batch processing, a 1-second delay once in every 100 executions is generally not a problem. Consider the following use case: users sign up for a website. As soon as their sign-up is complete, their email address is stored in an SQS queue. Lambda polls the queue and sends out a welcome email to every record it finds. Does it matter if the email to one of those addresses is sent a second slower than the rest? Probably not.

Scenarios where cold starts do matter generally fall into two buckets: direct user interaction and time-sensitive processing. Direct user interaction describes Lambda Functions where the execution time is directly experienced by the user. An example is a Lambda Function generating an HTML page: the user clicks a link, the Lambda Function starts executing, and only when the Lambda Function is done does the user receive a response. If this takes 100 ms for warm starts but 1 second when no execution environment is available, 1% of visitors are going to have a bad experience. In this scenario optimizing cold starts will be worth the effort.

Cold starts are also relevant when a Lambda Function is executed as part of a time-sensitive event. A real PostNL example can be found in sorting machines: when a sorting decision is not made and returned quickly enough, a parcel might have to do an additional loop on the sorting line. If this happens too often, the machine might back up and would not be able to accept new parcels. A sorting decision generally has to be made in 1.5 seconds. If one in every hundred Lambda executions adds one second in cold start time, 1% of parcels are put on additional sorting loops, leading to reduced sorting efficiency.

If your Lambda Function impacts user experience or processes a time-sensitive event, read on. If not, ask yourself if optimizing cold start is worth the development time. The optimizations below describe common methodologies for reducing cold starts, listed from most to least impactful. Some of these optimizations, like changing programming languages, will be hard to implement. Others, like tuning performance, might be easier. Which to apply depends on your environment and business goals.

Optimization 1: provisioned concurrency

The easiest and most efficient method to reduce cold starts is by enabling provisioned concurrency. This Lambda feature keeps a selected amount of execution environments warm at all times. The init phase (every part of the execution before the warm execution) is executed out-of-band from Lambda Function invocations, which means that the cold start will not affect users at all. Provisioned concurrency comes at a cost though: because the Lambda Function is running continually, you will be billed for every millisecond you have provisioned concurrency enabled. Another caveat of provisioned concurrency is that requests above the configured concurrency value will still experience cold starts. The optimizations below affect all executions, both inside and outside provisioned concurrency.

Optimization 2: language choice

If cold starts and performance are important metrics for your workload, choosing the most performant language is the single most impactful decision you can make. Let’s compare six commonly used languages: Java, C#, Node.js, Python, Go, and Rust.

At the risk of oversimplifying, Java and C# code runs in a virtual machine. For Java, this VM is called the JVM, for C# it’s the CLR. This VM needs to be started at runtime, which leads to increased cold start times. However, once it runs (read: warm starts), code is generally executed very quickly. Using Java and C# for user-facing Lambda Functions is generally advised against.

Node.js and Python are interpreted languages. A binary (i.e. python or node) is used to execute the code written by developers. The Node or Python code is converted to machine code (the bits and bytes executed by the CPU) on the fly. This process is called just-in-time (JIT) compilation. JIT compilation costs a bit of CPU time, which makes the execution of interpreted languages slightly slower than VM-based languages. With interpreted languages there is no VM to start, so cold starts on Node and Python are faster than those on Java and C#.

The third category is compiled languages. Go and Rust are the most commonly used compiled languages in Lambda Functions. As the name implies, compiled languages are compiled for a specific CPU architecture (e.g. x86–64 or arm). The compilation from source code to binary executables generally takes place in a build pipeline. Because the compiled code is already optimized for the target architecture, there is no runtime translation. Nor is there a VM or interpreter to start — the code executes as is, as soon as it is called. The cold start penalty for compiled languages is negligible, measured in single-digit milliseconds. Because there is no JIT compilation, pre-compiled languages are also very performant at runtime.

See the chart below (source: Energy Efficiency across Programming Languages) for a runtime performance comparison across 27 different languages. The left column shows energy usage, the middle column execution time, and the right column memory usage. Compiled, interpreted and VM-based languages are indicated with ©, (i), and (v), respectively. Note that compiled and VM-based languages top the charts, but the cold start impact is not included in these stats.

Source: https://greenlab.di.uminho.pt/wp-content/uploads/2017/09/paperSLE.pdf

Optimization 3: improve performance to reduce concurrency

A cold start occurs when no warm execution environments are available. All existing execution environments might have expired, a Lambda Function might run in a different availability zone, or every existing execution environment might be in use. In the last case, a new environment is deployed to run in parallel with existing executions. This is called concurrency, and the two execution environments count as two concurrent executions. Each of those concurrent executions experiences its own cold start.

By reducing the execution time of the function itself, execution environments become available sooner, leading to more reuse and fewer cold starts. Of course, faster execution is always nice, but not always achievable. Lambda Function performance optimization is an entire topic in itself, but special mention should go out to increasing the Function’s memory size. More memory translates directly to more CPU power, and above a certain value even to more virtual CPUs (which can benefit multithreaded workloads). The CPU is throttled at any memory configuration below 1769 MB. At 3008 MB, two unthrottled vCPUs are available to the function. Increasing the amount of memory up to 1769 MB will almost always lead to faster execution times, and as a direct result, to fewer cold starts. More memory leads to a higher cost per second, but because the execution time is lowered, the actual cost can remain the same or even be lower at higher memory configurations. Details about the relation between Lambda Function memory, CPU, and cost can be found in the AWS Blog Operating Lambda: Performance optimization — Part 2. This article also introduces Lambda Power Tuning, a tool to help determine the ideal memory configuration for your workloads.

Optimization 4: initialization code

The code in a Lambda Function consists of two separate components:

Code run for any new execution environment. This is also known as bootstrap or initialization code.
Code run for any new Lambda Function invocation, regardless of cold or warm starts. This is also known as handler code.

The general guideline is to put any reusable, non-execution-specific code in the initialization part. This will make sure the code is initialized only once for this execution environment. This makes every execution after the warm start faster. Additionally, CPU throttling is only applied after the initialization phase. This means Functions with lower memory configurations execute code faster in the initialization phase than in the handler code. This is explained in more detail in the article Lambda Cold Starts and Bootstrap Code.

However, moving code to the initialization phase might lead to reduced efficiency when a single handler file is used for multiple handler functions — a common scenario in REST APIs. When this configuration is used in combination with higher memory configurations they add an additional burden on the cold start. When optimizing for shorter cold starts, you might want to opt for lazy initialization instead:

import os
import boto3


S3_BUCKET = os.environ.get("S3_BUCKET")


class Wrappers:
    def __init__(self):
        self.s3_client = None
        self.ddb_client = None


wrappers = Wrappers()


def event_handler_1(
    event: dict,
    _context,
) -> dict:
    if not wrappers.s3_client:
        wrappers.s3_client = boto3.client("s3")

    s3_key: str = event["s3_key"]
    body: str = event["body"]

    wrappers.s3_client.put_object(Key=s3_key, Bucket=S3_BUCKET, Body=body.encode())


def event_handler_2(
    event: dict,
    _context,
) -> dict:
    if not wrappers.ddb_client:
        wrappers.ddb_client = boto3.client("ddb")

    # Do something on DDB

In this example, the S3 client is only initialized in the handler code of the first function, then stored for reuse. The same goes for the DynamoDB client in the second function. Be careful to only use this solution for actual reusable components, not for event-specific data like usernames, provided input, or passwords.

Optimization 5: package size

Lambda Function code and Lambda Layers are stored as ZIP files on S3. Lambda Container Images are stored on ECR, which itself uses S3. When a Lambda Function is deployed to an execution environment, the code package, including any dependencies installed with it, is copied from S3 to the EC2 instances underpinning the Lambda service. This movement of data costs time, and it will cost more time if there are more bytes to move. A good way to improve cold starts, then, is to reduce package size. Common methods to reduce package size, from most to least effective, are:

Split large Lambda Functions into multiple smaller ones, each with their own dependencies
Use small Docker base images.
Apply modular imports, e.g. only importing DynamoDB and S3 instead of the entire AWS SDK. For an example in JavaScript, see the AWS Blog Modular packages in AWS SDK for JavaScript.
Remove unused dependencies.
Remove unused code, such as tests.
Minimizing code.

This optimization is listed last because reducing package size generally doesn’t deliver large cold start improvements. This is especially true for lines of code (LoC) reduction like removing tests and minimizing code.

Conclusion

In this article, we have covered what Lambda cold starts are, whether you should optimize them, and 5 ways to reduce cold start durations for your functions. At PostNL, these methods help us to improve the user experience of our customer-facing applications and websites, and allow us to process your parcels quickly and consistently. We hope that this overview will help you build better experiences too.