Resolving Cold Start️ in AWS Lambda

Yes, I know cold starts are frustrating in lambda. Lets start the post with a formal introduction. AWS Lambda is revolutionising the cloud. It is the swiss army knife for cloud architects and developers are creating interesting use cases day-by-day with the lambda.

Lambda is amazing. It is cost-efficient, infinitely scalable, serverless. But lets accept the fact that, they do have some drawbacks. One such is cold start ❄️. What is cold start? Lets go with definition given by Yan Cui.

A cold start occurs when an AWS Lambda function is invoked after not being used for an extended period of time resulting in increased invocation latency.

In case of a lambda powered web app, it affects the user experience when the backend takes lot of time to respond the user queries. In a production setup, lambdas are usually deployed within a VPC’s private subnet. This is because lambda has to access the database, which is usually deployed in private subnets. These lambdas inside VPC worsens the cold start. It quadruples the invocation latency.

Statistics behind Cold Start

Lambda is billed per invocation. It makes no sense for AWS to keep our functions warm all the time. Yan Cui has done an excellent work by using various strategies to understand the cold start in lambda. Here are the takeaways from his experiment:

  • The idle timeout for lambda function is not a constant. AWS will kill your functions depending upon the resource’s demand/supply for the given AWS region.
  • Higher the memory (RAM), the more time it stays warm. (This does not apply all the time)
  • Statically typed languages (Java, C#) experience a higher cold start time than the dynamically typed languages (NodeJS, Python).
  • Deployment package size does not affect the cold start time.
  • Based on his experiments, we can assume that our functions stay idle at least 40 minutes. (Again, this does not apply all the time).

The Solution (Hack)

The simple hack is to invoke your lambda at every fixed period of time. I would invoke my lambda every 20 minutes to keep it warm. Lets take my use case and explain the solution in detail. Here is the cloud architecture of NJ2JP, an e-commerce application.

Serverless Cloud Architecture

This application is completely powered by AWS Lambda. You can see, there is a wake-up Lambda next to main Lambda in the private subnet. This lambda invokes the main lambda every 20 minutes, to keep it warm and answer user queries instantaneously. Let us also look at the the specific architecture of the wake-up Lambda.

Wake-Up Lambda Architecture

You can see that the wake-up Lambda periodically warms up Kingdom of Lambdas. I prefer to use lambda in my cloud architecture as much as possible. So, it is convenient to have one wake-up lambda to warm up all the lambdas.

Why is it better to warm Lambda every 20 minutes?

I prioritise user experience than system cost. Even though Yan Cui’s experiments suggest that, in most cases lambda stays idle for 40 minutes. I would like to be a little more pre cautious, that my users should never experience an invocation latency. This is the reason for warming up the lambdas every 20 minutes. I agree that one of the strongest factor of Lambda is its inexpensiveness. It costs 20 cents per million invocations. Lets do a simple math for the 20 minute wake-up Lambda:

(24 * 3) * 31 (days in the month)= 2232 invocations

It is just around 2000 invocations per month which is very much still under our budget. Enough of rambling around the theory, lets jump into the implementation.

Wake-Up Lambda Implementation

I use serverless framework to deploy my lambda functions. So here is the serverless.yml file of the wake-up lambda:

  • We will be using AWS Lambda SDK to invoke the lambda functions, so make sure you assign proper IAM permissions to the wake-up Lambda. Lines 13 - 21 shows the required permissions.
  • The requirement for the Lambda SDK is the function name. Take those function names as environment variables. It the above file, it takes two function names to wake-up. As your kingdom of lambdas grow, this setup becomes easy to add the upcoming functions. Lines 22 - 24 shows the lambda function names taken as environment variables.
  • The time you want your wake-up lambda to wake up all other lambdas has to be mentioned in the serverless.yml. Line 36 shows the fixed interval of time. Technically, it is created as a cloud watch rule, which invokes the wake-up lambda every 20 minutes, which in turn warms up the subscribed lambdas.

Now, lets see the handler.js which powers the wake-up lambda.

  • Lines 3 - 8 are the initialisations. We are using bluebird for promises and AWS javascript lambda SDK for invoking other lambdas.
  • The invoke function of javascript’s lambda SDK is used. The requirement to that function is the function name, invocation type and payload. Lines 13 - 22 represents the function’s parameters. Make sure that the payload is same as the normal request’s payload or have an edge-case inside the main lambda to handle wake-up lambda’s request.
  • Lines 24 - 47 warms up the subscribed lambdas in parallel and prints out the results. In case of an invocation error, it throws them.

Tobiah Rex has done a nice job of writing this wake-up Lambda that can be re-used to your project. Check out the wake-up Lambda repo for the instructions to deploy it.

Why use AWS-SDK to Wake-Up Lambda?

Each Lambda in your kingdom of lambdas will have different event sources. The sources can be APIGateway, IoT, S3 events, SNS topic and so on. Our goal is only to make our lambdas warm to answer the requests instantly. So, AWS SDK would be the right choice to warm up all lambdas with different event sources. In addition to that, this will also reduce time taken to invoke the lambda when compared to invoking the lambda from the associated event source.

Takeaways from Wake-Up Lambda

  • Warm-Up 🔥 your kingdom of lambdas using a single wake-up lambda.
  • Warm-Up 🔥 your lambda every 20 minutes to have a better user-experience.
  • Use AWS Lambda SDK to warm-up your kingdom of lambdas. This reduces the invocation time and makes it uniform across lambdas with different event sources.

Thank you for reading. If you find something wrong or better ways to do it, let me know in the comments below.

If you like the post, hit the 👏 button below so that others may find it useful. You can follow me on Twitter.