Making .NET AWS Lambda Functions Start 10x Faster using LambdaNative
Cold starts are one of the most often misunderstood aspects of AWS Lambda, yet they can have the largest impact on a function’s performance. Many developers have tried to solve the problem by invoking their function on a schedule to keep it warm. This doesn’t actually work and misses the point of functions as a service. You can increase the memory limit to speed things up, but that can cost more and only makes things slightly better.
Today, I present LambdaNative to you, a solution that gives you the best of both worlds (cost and performance).
What are cold starts and why are they slow?
AWS Lambda functions run inside containers (Firecracker microVMs). Each container can run a single invocation of your function at a time. At any given time, Lambda has a pool of zero or more containers for each function. When a function is called, either an existing container is reused or a new one is created. The colloquial term “cold start” refers to a new container starting.
The fact you need a container for each parallel request, and that they can be terminated at any time, means you can’t keep a function warm. Not that you should want to.
Some of the reasons cold starts are slow, such as copying your code from S3 or VPC networking, apply to all of the supported runtimes, not just .NET Core. The biggest performance hit for .NET functions, however, comes from converting the Common Intermediate Language (CIL) code into machine code via just-in-time (JIT) compilation.
The more code and libraries you use in your function, the slower it will be. Less abstraction and more action leads to faster cold starts.
How slow is slow?
I’m glad you asked because I’ve got the numbers! To get this data, I created a test Lambda function that…
- Takes an
- Reads some environment variables,
- Writes to a DynamoDB table,
- Serializes an object as JSON,
- Publishes that JSON to an SNS topic, and
- Returns an
This is a realistic use-case which involves libraries such as
Amazon.Lambda.APIGatewayEvents, all of which will need to be compiled to machine code.
I then created an AWS Step Function to continuously modify and directly invoke the above function. The modification is changing the timeout, which currently causes Lambda to perform a cold start.
ModifyLambda state executes a different function that performs the modification and returns an
APIGatewayProxyRequest. The output of each state is used as the input to the next, so
InvokeLambda invokes the test function with the request it expects.
Finally, I used AWS X-Ray to record accurate timings for each invocation.
A standard cold start consists of initialization and invocation.
Initialization, which only happens on cold starts, is everything up until your class method is executed (including instantiating your class if it’s not static).
Invocation includes deserializing input, running your code, and serializing output.
The graphs in this post all show 25 sequential invocations of the test function along the x-axis and timings on the y-axis. The lines represent different memory limit configurations available at the time of testing.
Remember, the amount of CPU your function can use increases with memory.
As you can see, there’s not much difference between the memory limits. Initialization is between
200 ms and
350 ms for standard .NET functions.
In the invocation phase, however, we see a huge difference between memory limits. This difference is caused by JIT’ing and running the actual code.
It’s clear that
128 MB is the slowest with an average of
10,671 ms, while the fastest is
3008 MB with an average of
802 ms. The fact that there is very little improvement after
1024 MB is interesting.
What does fast look like?
The following graphs so the same function running under LambdaNative.
There’s not a lot of performance to be gained during the initialization phase, but there is still some:
Under LambdaNative, initialization is between
150 ms and
250 ms, which is a 25% improvement. That’s nice, but we’re just getting started. The real savings are made during the invocation step.
128 MB is now averaging
1656 ms (a 10x improvement) and the rest are all faster than
3008 MB was previously.
3008 MB itself is down to
91 ms on average (an 8.8x improvement).
The graph below shows standard (dotted lines) and LambdaNative (solid lines) overlayed for comparison.
LambdaNative also has faster warm starts. The graph below shows the first 24 warm invocations after a cold start.
Notice how the first warm start of a standard function is slower than the rest? LamdaNative also mitigates that strange behaviour.
What is LambdaNative?
At the end of 2018, AWS announced Custom Runtimes and the Runtime API that enables them. In a nutshell, you select Custom Runtime and provide an executable named file
bootstrap (in your .zip file) which AWS Lambda will execute instead of its own. You’re then responsible for interacting with an HTTP API to get executions, running handler code, and reporting the result back to the API.
LambdaNative is a library that handles the API interaction and error handling for you. All you need to do is tell it which handler to execute by implementing an interface and calling
You can then use CoreRT to perform ahead of time compilation, producing a native executable that doesn’t require any runtime compilation.
I’m very happy to announce that v1.0.0 is now publically available!
The README in the example directory on GitHub has very detailed instructions on how to get started with LambdaNative.
This is obviously a bit more work, but the results speak for themselves. Having said that, integrating it all into your build system would hide most of the added complexity.
That’s it! If you try it out or have any feedback, please let me know!