Suppose you want to buy a ticket to a concert of your favorite band and you know that the tickets are running out. You need to open your computer, wait for the launch of the browser, go to the website of ticket sales, see there is only one ticket remaining, and click to buy. Then boom, “Sold out” is the phrase you are going to hate for a while. We are sad to say but you are “cold started”. You would have been able to catch the last ticket if your browser window was open, showing the website, and ready for you to click the “buy” button. You should have kept your computer warm to buy this ticket.
The story is similar if you are leaning on serverless environments in production. When your code is triggered after a while for the first time, a cloud provider initializes a container to run your function like you would open your computer and launch a browser to buy tickets. This creates a latency of up to 5 seconds which is quite unacceptable for usage in production. This problem is called “cold start”. Your function might experience cold start while a customer wants to use your product. This time your sad customers will not stay sad forever (like you might be about your sold-out tickets) but will prefer your competitor for the next time.
You may think that if this cold start is only for the first request there is nothing to be worried about. Out of 1 million users, a user can experience a cold start and this won’t create big annoyance. Unfortunately, this is not the case. AWS keeps containers up for a specific time, which seems like 45 minutes today but is prone to change. If the distribution of your function invocation is sparse, you are more open to cold starts. Moreover, if your function is experiencing concurrent invocations, AWS will initialize containers concurrently. It is inevitable that you will catch cold starts at all those containers at the same time. Nightmare, right? Although it is easy to start with and extremely cheap, you cannot lean on this kind of architecture without precautions.
How can you avoid cold starts?
You cannot really avoid cold starts, they will always happen. You can just reduce the time by taking some precautions. The good news is that there are methods to reduce the frequency and duration of cold starts. By using those, you can protect your Lambdas from being cold started. Here are a few tips that you can use:
- Instead of using statically typed programming languages like Java and C#, you may prefer dynamically typed languages like Python, Node.js.
- If you don’t have any obligation, avoid putting your Lambdas in VPC. VPC will definitely add some time to invocation and might cause cold start.
- Making HTTPS calls inside your lambda might cause cold starts. SSL handshake and other security related calls are CPU bound and might create cold starts to your function. So use those wisely!
- If you prefer Java for your AWS Lambda functions, you should definitely avoid dependencies that scan classpath, like Spring. This is an open invitation to cold start. Moreover, loading classes of the Java function will take some time and might cause cold start.
There is also wrong information that we may think to be true. For example; people think that allocating more memory can cause less cold starts. It is both correct and wrong at the same time. Allocating more memory will lead more CPU as well and in this way the initialization will take less. On the other hand, in its great article, Yan Cui makes some benchmarking of cold start and it shows that AWS tends to close the containers with more memory faster because the area they hold can be used by other invocations. His research shows that a function with 1536MB memory stays idle approximately 15 minutes less compared to a function with 128MB memory. Therefore, it is wise to say allocating more memory will reduce the cold start possibility but your container will stay warm for less time. You need to take into consideration this trade-off while deciding for the memory of your function. We need to warn here that the behavior of closing or opening new containers are not final and may change over time.
Another thing that we believe is that the cold start possibility depends on the size of the deployed package. Actually it has no positive or negative effect.
Keeping your containers warm
While you are taking initiatives to avoid cold start, AWS also tries to improve their environment so that AWS Lambda is more usable in production environments. Yan Cui’s experiments show that AWS Lambda keeps containers up for approximately 45 minutes. We should say at this moment that this is also prone to change, considering it was far less than 45 minutes and AWS is improving in a silent mode. Moreover, there is also something that you can do: “Warm-up functions”.
In order to prevent containers going down, you need to send dummy requests to your functions with some frequency. Of course you need to make necessary changes on your Lambda to distinguish warm-up calls from customer calls. This sounds ironic considering we prefer Lambda because we are promised not to deal with scalability. However, we need to keep our customers happy and make them wait as little as possible.
Thanks to warm up calls, AWS will keep the containers up and you will never experience cold start. Sounds nice, right? But it is not that straightforward.
One warm up call can keep one of your containers warm, but what if you have more containers that you should keep warm? If you are using Lambda in production, it is likely that you will have more than one container that should be up at the same time. You need to make warm-up calls in parallel to keep the desired number of containers up. There are two risks at that point you still have:
- You keep all of your containers busy with warm-up calls and a real customer request couldn’t find a place to run. This again causes a cold start for a real call.
- A Lambda function in one container can catch all calls at once and this can make other containers down after a while.
To deal with the first problem, we have invented a method in Thundra. We don’t send N warm-up calls continuously if you want to keep N containers up all time. Instead, we send N/2 + R warm up calls more frequently and send N warm up calls less frequently where R is smaller than N/2. For example; if you want to keep 10 containers up at once, you should send 10 parallel warm-up calls in every 30 minutes while you can send a random number of calls between 5 and 10 every 5 minutes. This way, you can assure that you will block all your containers once every 30 minutes while keeping your Lambdas up for the entire time.
To deal with the second problem, you should wait on your Lambda a little (we are preferring 100ms, and it is configurable in Thundra) while dispatching the warm up calls. This way, a single container doesn’t catch all requests and you can keep the desired number of containers warm at the same time.
After a while, the load on your application can increase and this will mean that you need to keep more containers up in order to avoid latency. In this case, you need to increase the number of parallel calls to your Lambda containers. Similarly, you may need to decrease the number of parallel calls if traffic is following a pattern of decreasing over time. In such cases, it is better to have a scale factor to decrease or increase the number of containers. We prefer to double or cut in half.
Sending warm-up calls with some randomization, dispatching those Lambda calls with a 100ms sleep, increasing/decreasing the number of parallel calls- This creates a lot of development effort to maintain our serverless environment. We came here because we don’t want to allocate time and energy to operational maintenance, right?
At this point, Thundra can do the necessary jobs for you. Once you adopt our warm-up module, you don’t need to write code to avoid cold starts. You only need to configure our warm-up module to decide on how frequently you send warm-up calls or how many concurrent warm-up calls you make. There are many other configurable fields that you can adjust. Check our documentation to learn more.
You may think that warm-up calls can create a cost to your Lambda. However, let’s assume that you need to keep 10 containers up. This means that you will make 20 calls because of 10 calls every 30 minutes, and you will make around 75 calls because of random warm-up calls every 5 minutes. This makes 95 invocations in an hour, 68400 invocations per month. Assuming our function will sleep 100ms. This is totally free according to AWS Lambda calculator.
In order to demonstrate how effectively we are helping the cold start problem, we made a benchmark with one control and one test Java Lambda function. We run a Lambda environment without our warm-up module and we also plugged our warm-up to a module to one instance of the same function.
In our benchmark environment, we have a simple Lambda function and it completes its invocation in 50–100 ms. We deployed the same function with different configurations as shown below:
Note: The ones which are warmed-up are auto discovered to be warmed-up by `thundra_lambda_warmup_warmupAware` enabled environment variable.
To see the affects of memory, invocation frequency, and warm up at the same time, we tested our functions with warm-up enabled with changing memory allocations and invocation frequencies. We have used 4 different memory allocations (512MB, 1024MB, 1536MB, 3008MB) and 2 different invocation frequency (in every 1 minute and in every 5 minutes). Therefore, we have 8 Lambda functions with warm-up module and 8 identical functions but without warm-up module.
We have other Lambda functions to make test invocations to Lambdas. Those functions are adjusted to send concurrent invocations. Number of concurrent invocations are randomized to simulate real world at each iteration and it is between 1 and 16. Therefore, we have configured `thundra-lambda-warmup` Lambda function to warm up function with concurrency factor 16 by setting `thundra_lambda_warmup_warmupInvocationCount` environment variable to 16.
Here are the results of our experiment:
Results of experiments show that if a function is called with a higher frequency, its possibility of catching cold started decreases. This is like a natural warm-up message to that function. Even in this situation our warm-up module reduces the cold starts. However, if the function is not very frequently triggered, it seems that possibility of catching cold started increases almost exponentially. In this case, our warm-up module has improved the cold start performance tremendously by decreasing cold start occurrence to 0,39% from 18,41%. Thanks to the improvement Thundra achieved, this function reached level of a trust to be used in production. Moreover, by playing the configurable fields, it is possible to decrease the cold start level even more.