Solving cold starts on AWS Lambda when using .NET Core

Published in

Slalom Build

10 min readOct 7, 2020

AWS Lambdas and .NET

The concept of serverless computing is incredible. No server provisioning, no worries about web server management, just focus on your code and pay for what you use. Using just a little? Pay just a little. Using a lot? The services will scale accordingly. No need to trigger more instances with a script, waiting for those to start… and so on!

AWS Lambdas are here to take care of our needs (as long as we understand how they work ^_^).

Cold starts on AWS Lambda using .NET Core

When I first saw a demo for a serverless application hosted on AWS Lambda using .NET Core, I noticed that the first time each function was executed took a very long time… some pages took more than 10 seconds to load.

Granted, the application was in the development phase, but something needed to be done to mitigate this problem.

It turns out the problem appears when a Lambda function is performing what’s called a cold start. To understand what a cold start is, we first need to understand how AWS Lambdas work.

How do AWS Lambdas work?

Lambdas work by having a Linux container take care of a user (or system) request. In the case of an API Server, each API call will be one request and will have a separate container processing each request.

It’s critical to note that each container will only process one request at a time, and all containers are isolated from one another. From a scalability point of view, this is fantastic. We can have thousands of users interacting with our application at the same time and we only pay for their usage and no more.

But depending on how we manage our Lambdas, they may be very user unfriendly due to the problem of cold starts.

What’s a cold start?

What we normally call cold start has to do with the creation and initialization of a new container that will process our request, as well as the initialization of the program that the container will run.

While the program is running, the container stays active and executing (in other words, it stays warm). But when the program stops executing and the container is inactive for some time, AWS drops the container.

With each subsequent request, the container will have to be spun up again, involving a new container and program to be launched to service the request. The additional time taken to launch this new container is called a cold start.

Hopefully this container will be reused for multiple requests, minimizing the amount of cold starts during the life of our application.

When do cold starts happen?

A cold start occurs on the initialization of the container (and the program within it) when the first request is received. That container will be killed at some point of it not being used, needing a new cold start with the next request.

Additionally, AWS will cycle the containers even if they are used periodically. So when this happens, the next request will also be a cold start. While there’s no clear indication of how long AWS keeps Lambdas alive, here’s an experiment that provides some insight into it.

If there are concurrent requests, the framework will create more containers. And if there are 5 concurrent requests, we need 5 different containers to be able to handle them! Otherwise, the user will have to wait for another container to finish its execution, or throw an exception.

How bad can this get?

A “hello world” application using .NET Core 2.1 that only displays hard-coded text takes 2.1 seconds to initialize when configured with 256MB.

Depending on your configuration, this will happen the first time you hit each function, which can create a poor user experience — even more so if you have one function call another. Keep in mind that these numbers are for an API that doesn’t have dependencies or connections to a database, critical bits of application infrastructure that will further slow container load times.

Let’s fix our problem — utilizing our own custom framework

This section is intended to give you an idea on how to solve this issue, but if you are looking for a detailed step by step guide, please check the documentation in this repository:

https://github.com/polgaro/LambdaExample

The comparisons were performed using the “Empty Serverless Application” template that comes with the AWS Toolkit for Visual Studio.

As a baseline, we’ll use the .NET Core 2.1 and we’ll create a function “Get2”

We’re only creating a new function so we don’t use the default one created by the template. If we were to use the default “GET”, we’d get the same numbers:

Ouch! 2.2 seconds.

Let’s switch to .NET Core 3.1

The next step is to switch to a custom Runtime and to .NET 3.1. You can follow along with the code by reading the step by step implementation here.

For this, we’ll have to use the “Provided” framework. After adjusting our code, our time is… the same.

But this is only the beginning!

Adding ReadyToRun!

Adding ReadyToRun will require us to compile utilizing Linux (or a Linux-based container) because the underlying technology (crossgen) is not cross-platform between Windows and Linux.

Remember that you can follow the step by step guide to see where I’m getting the numbers.

Measuring the time: bingo! Total time is under 1.2 seconds! Not bad huh?

But can beat that? You bet!

Trimming our assemblies

By changing our MSBuild parameters, we can trim the unused assemblies.

And the results? 1.1 seconds. That’s half of the original time.

Can we keep going? Yes, we can!

Initializing the Newtonsoft.Json library ahead of time

Part of the time is being utilized on JITting the libraries and also on the execution of those libraries before the Tier 1 compilation (Or full JIT) has been completed. You can read more about tiered compilation here.

So, initializing the Newtonsoft.Json with a really small payload gets us down to one second! Great!

Disabling some optimizations

Even though disabling some optimizations may sound counter-intuitive, there are functions that we will execute only once and we can save time by instructing the framework not to do a Tiered 1 (Full JIT) compilation.

After disabling some optimizations in our startup routines, we get… 800ms!

This is a solution we can present to our stakeholders!

Additional recommendations for reducing cold starts

In addition to the investigation above, we have several other ways to address cold starts:

1. Increasing the amount of memory we use for each container

AWS scales our containers with the amount of memory, allocating more CPU resources.

As you can imagine, if we increase the amount of memory it will impact the amount of money we will pay per request; so please do the math before making this change!

2. Paying for our our applications to be ready and always initialized

A drawback to this approach is that we pay for each reserved concurrency. This makes our Lambdas more expensive and harder to estimate cost.

How many users will I have? How many of them are going to use the application at the same time? How much am I willing to pay? And most importantly, why am I using a serverless model in the first place if I’m going to be paying by the hour?

3. Creating our own Runtime

For robust enterprise development, you can use your custom framework. It may take you some time to code, but this will allow you to customize your framework so you can, for example, initialize libraries ahead of the first call.

Some of these tweaks are extremely beneficial, especially if you are using more than 1,792MB that gives you a second processor core.

We can create our own Runtime that will allow us to use multiple approaches to reduce the time:

Utilize the “provided” Runtime instead of using the ones created with .NET on them.
Customize the creation of the deployment package, using ReadyToRun, assembly trimming, etc.
Initialize the JITting of the assemblies we will use with a small payload ahead of the actual utilization.

4. Using ReadyToRun on AWS .NET Framework 3.1

If you’d asked me a couple months ago, I’d have said that to have a quality deployment of .NET Core + AWS Lambda, you have to have your own Runtime. But since then, Amazon introduced support for .NET Core 3.1, that opened the doors to use ReadyToRun without having to create your own Runtime.

But as most things in Software Architecture, it depends on your needs. With the introduction of the new 3.1 framework, you can use ReadyToRun without having to create your own Runtime. This can reduce the cold start times by more than 40%, making it a great alternative for rapid development. It’s fast enough and really easy to implement.

See the Suggested Reading section to learn more about ReadyToRun and follow the link Lambda support for .NET Core 3.1 to learn more about it.

There’s a big caveat though. The new Serializer provided by Amazon is severely lacking. It does not support loops as it works with System.Text.Json.

You can read about SystemText.Json and how it measures up to Newtonsoft.Json and make your own decision based on your needs.

Let’s compare the numbers

Putting all these ideas together, here are some benchmarks for the speed improvements you will see on cold starts using each approach:

2.1: Using .NET Core 2.1. 2200ms
3.1: Using custom framework with .NET Core 3.1. 2200ms
3.1 R2R: Using ReadyToRun over the Custom Framework 3.1. 1200ms
3.1 R2R+T: Adding Trimming over the previous test. 1100ms
3.1 R2R+T+IJ: Adding Initialization of the Json library over the previous test: 1000ms
3.1 R2R+T+IJ+O: Adding optimization hints over the previous test: 800ms
3.1 NS: With the new Serializer and framework: 400ms

Note that we never changed our memory size. If you do that, you can get almost negligible cold starts as Amazon will increase the CPU allocated to our process (don’t forget that this comes at a cost though).

You can get the code that I used for the examples here.

Pros and cons of choosing to use our own framework

There are multiple advantages to choosing to create your own framework to solve the cold start problem:

We can switch to our own framework without having to make changes in the rest of our code. After you know what to change, this process is very painless.
We can ensure that components like Newtonsoft.Json are being initialized with small footprints.
We can hint the compiler to not optimize certain functions, reducing the cold start times even further.
We can make use of multiple cores (when using more than 1,792MB) to initialize libraries in parallel even before we process the first request.
Support of layers. We can have one layer with all code and re-utilize it in all Lambdas.

Detracting from these advantages is the fact that we are not using any of Amazon’s provided frameworks. One example is that at the time of writing this article the custom framework runs on Amazon Linux (instead of using Amazon Linux 2).

In any case, using a custom provider may not be something that you feel comfortable doing.

What about choosing to use the .NET 3.1 Runtime?

This approach also has some advantages:

If you can get around the Serializer, this may be the fastest solution.
If you’re creating a new application from scratch, you can design your data transfer objects (DTOs) without loops, avoiding the single biggest pain of using the new Serializer.
You can create your own custom Serializer. (with the understanding that using Newtownsoft in a custom Serializer may add considerable amount of time to your cold starts.)

There are a few challenges with adopting this approach, as well:

The Serializer that Amazon provides is lacking, as it uses System.Text.Json.
We don’t have a way to initialize libraries before the Serializer is invoked.
No layer support. You have to upload the whole code for each Lambda you have.
When migrating from a previous version, you may need to change the code of our application when swapping the Serializer.

Conclusion

By creating our own Runtime, we can save a lot of cold start time by using ReadyToRun, hinting the compiler to not optimize some code and optimizing the functions we do want optimized (while using smaller footprints).

It’s a good idea to mark some functions to be optimized, as we do want the program to run as fast as possible. However, some functions will be executed only once and don’t need to be optimized. The compiler won’t know that those functions will not be called again, but we do!

If you are creating your application from scratch, or if you don’t think you’ll have problems with the Serializer, try out the new .NET Core 3.1 Runtime! It provides really fast start-up times and it’s easy to initialize!

Remember, we took a running API from 2200ms to 800ms using a Custom Runtime. But is this all we can do? Nope!

In a future article, you’ll see how to optimize your code for Lambdas by structuring your dependency injections, mappings, and Entity Framework calls.

Stay tuned!