Benchmarking Lambda’s New Custom Runtime for .NET Core

Today, AWS released a new library named Amazon.Lambda.RuntimeSupport.
It contains a Custom Lambda Runtime made for those who want to use newer versions of .NET Core such as 2.2 or the 3.0 preview (and any future version).

Until now, Lambda has only provided native support for .NET Core 2.0 and 2.1. Amazon says that in the future they will only build Long Term Support releases of .NET Core directly into Lambda. For anything else, it looks like this new library is the solution.

In this post, I’ll talk about what AWS has done, and then take a look at how the new library performs against .NET Core 2.1 functions with native support.

Photo by Tim Gouw via Unsplash

Custom Runtimes

When you create a Lambda function, you simply upload your handler code. AWS manage everything else for you. This includes tasks such as deserializing input, running your code, handling errors, serializing output, and passing that output back to the Lambda service.

Custom runtimes, which were announced at reInvent 2018, hand most of this responsibility over to you in exchange for a lot more freedom. This freedom has allowed people to extend Lambda with support for languages like Rust, PHP, and even Bash.

I recently published a blog post titled Making .NET AWS Lambda Functions Start 10x Faster using LambdaNative about a custom runtime I wrote.

All custom runtimes, including this new one, repeat the same basic steps:

  • Poll an HTTP endpoint to check for new invocations.
  • Deserialize the response body (the invocation payload).
  • Set some environment variables.
  • Run the handler code and catch any errors.
  • Serialize any error or handler output.
  • Send the output back to Lambda via the API.
  • Return to polling.

Note that they’re not really polling because Lambda freezes its containers between executions. I’ve discussed this behaviour before in What Happens to Running Threads When a Lambda Finishes Executing?

Amazon.Lambda.RuntimeSupport

When you use a custom runtime, your function is executed in a container that doesn’t have .NET Core installed. The RuntimeSupport library solves this using a self-contained deployment.

The template mentioned below is the one found in Amazon.Lambda.Templates. You can install the templates using dotnet new -i Amazon.Lambda.Templates then create a new project with dotnet new lambda.CustomRuntimeFunction.

Self-contained deployments are created using the --self-contained switch of dotnet publish. When this switch is present, which it is in the AWS template, the whole .NET Core runtime is published alongside your application.

The deployment packages produced by RuntimeSupport start at 30 MB, which is large compared to the 207 KB package you get normally.

Self-contained deployments produce an executable file, which means you need a static Main method. The Main method in the AWS template wraps your handler function in a HandlerWrapper object, passes that to the constructor of LambdaBootstrap, and calls RunAsync() on the latter. Don’t worry, you’ll be able to copy & paste this between projects.

By convention, Lambda expects custom runtimes to provide an executable file named bootstrap. The AWS template provides a shell script with that name which simply executes your assembly. This isn’t necessary if you just name your assembly file bootstrap.

I could probably talk about custom runtimes all day, but I’ll control myself and move on to the benchmarks. If you want to learn more, I highly encourage you to have a read through the code on GitHub and read the docs.

Benchmarks

To perform these tests, I modified the Step Function I made to benchmark LambdaNative. It now repeatedly executes multiple Lambda functions in parallel, optionally modifying them between executions to force cold starts.

I then extracted timings from AWS X-Ray using a small script. If you’ve never seen an X-Ray trace before, take a look at the one below.

Initialization is everything up until your code and only happens on cold starts, Invocation is your actual code, and Overhead happens at the end of invocations using a custom runtime.

Cold Starts

Let’s start with everyone’s favourite Lambda topic. The chart below shows the average total execution time for cold-start invocations…

Focusing on initialization, which only happens during cold starts, we get…

Likewise, the chart below shows only the actual invocation (and overhead where RuntimeSupport is used)…

Warm Invocations

Subsequent invocations are referred to as warm because they’re initialized and ready to go. The chart below shows the average total execution time of warm invocations…

My Interpretation

.NET Core 2.1 directly on Lambda performs consistently better in both cold and warm starts. For the cold starts, I believe the reason for that is the same reason I started playing with LambdaNative: just-in-time compilation is slow.

Remember, when you compile your code, you’re compiling it into Intermediate Language (IL). Then, at runtime, the IL is compiled into native machine code that the processor can execute.

Both the “Lambda Direct” and “RuntimeSupport” functions perform the same simple action of converting a string to uppercase. However, the RuntimeSupport library adds more code to compile at runtime.

Subsequent requests to warm containers benefit from the cold start compilation, which is why they’re so much faster in all cases. The HTTP request and response processing involved in RuntimeSupport is probably part of the reason its a bit slower.

Finally, we can see that there is no real difference between 2.2 and 3.0-preview3 when tiered compilation is disabled on the latter.

Tiered Compilation vs Compile Once

When I first ran these benchmarks, I didn’t realise Microsoft had changed the default setting for Tiered Compilation in .NET Core 3.0-preview3. It’s now enabled by default!

In short, when tiered compilation is enabled, your code is compiled to machine code in two stages. The first time it runs, it will be compiled quickly to improve startup speed. When the same code is accessed again later, it’s recompiled with more optimisations applied for better steady-state performance.

We can actually see tiered compilation in action quite clearly, especially with the smaller memory configurations. The cold starts are faster when tiered compilation is enabled, but the warm starts are much slower.

It’s still strange that the warm starts don’t get back to a similar speed eventually. Maybe tiered compilation is somehow being affected by Lambda’s virtualisation?

Important: If you use .NET Core 3.0-preview3 in a Lambda, make sure you add <TieredCompilation>false</TieredCompilation> to the PropertyGroup of your project file to disable tiered compilation.

Conclusion

Unless you have a problem only a newer version of .NET Core can solve, or simply enjoy the bleeding edge, you may want to continue using the directly supported LTS versions for these benefits:

  • Higher performance.
  • Security/bug patching managed by AWS.
  • Simpler implementation/codebase.
  • Smaller deployment packages.

For more like this, please follow me on Medium and Twitter.