Provisioned Concurrency: The Silver Bullet to AWS Lambda Cold Starts

9 min readDec 30, 2019

The year 2014 marked the start of the serverless era with Dr. Werner Vogels announcing AWS Lambda to an ecstatic crowd at AWS’ esteemed re:Invent. What was promised was a compute service abstracting away all the pains of cloud orchestration. Thus leaving the user to only worry about the business logic they would run on these magic little worker nodes spun out of thin air and pure engineering.

Even though AWS Lambda was not the first FaaS service out there, with a startup called PiCloud being the first FaaS provider back in 2010, AWS was the first major cloud provider to jump into the race. In fact, it can be argued that AWS is actually the one that kick-started the serverless era, soon followed by Google Cloud Functions and Azure Functions by Microsoft. By 2017 the cloud wars intensified, with more and more providers descending upon the battlefield, all championing one promise, no more orchestration needed.

In the following years serverless grew in popularity, but alas the serverless adornment began to wane. Many held the limitation cloud vendors put on their FaaS offerings as the culpable cause for avoiding mass adoption of serverless services. However, one of the primary causes of avoiding serverless adoption was cold starts, a phenomenon explained in the following sections.

However, AWS may have just announced the silver bullet to the much-dreaded cold start, and this has been offered in the form of Provisioned Concurrency. Half a decade after kick-starting the serverless train, AWS has shoveled in new coal with Provisioned Concurrency to accelerate the trend. Therefore, the purpose of this piece is to explain how Provisioned Concurrency works, and also answer the question, does the coming of this new feature still keep serverless technologies on the same track as was first laid out?

The Inherent Problem

To understand the solution to cold starts we need to understand why cold starts occur in the first place. They could generally be defined as the set-up time required to get a serverless application’s environment up and running when it is invoked for the first time within a defined period. With this understanding, we also accept that cold starts are somewhat of an inherent problem with the serverless model.

Serverless applications run on ephemeral containers, the worker nodes, where the management of these nodes becomes the responsibility of platform providers. That is where the wonderful features of auto-scalability and pay-as-you-go arise from since vendors such as AWS can manage the resources to match exactly the requirements of your application running.

The problem here though is that there is latency in getting these worker nodes in the form of ephemeral containers up for your first invocation. After all, the serverless principle is that you utilize resources when required, and when not required those resources theoretically do not exist.

Hence this unavoidable latency is what actually degrades the performance of your applications. This is especially true when you are building serverless applications that are meant to be time-sensitive, almost all customer-facing applications.

This latency varies from vendor to vendor, and across programming languages. For example, FaaS functions written in .Net usually have higher latency as compared to other programming languages, and Mikhail Shilkov was one of the personalities in the field to confirm this in his famous piece Comparison of Cold Starts in Serverless Functions across AWS, Azure, and GCP.

Nevertheless, we are witnessing the community improving performance over cold starts. For example, if we refer back to cold starts in .Net we see with newer versions of .NET demonstrating better performances as compared to their predecessors.

The graph below, for example, illustrates how cold starts improve over newer versions of .NET, across various memory allocation to the AWS Lambda functions. However, we are still experiencing latency which could be devastating, especially considering the results below arise from a simple hello world .NET AWS Lambda function.

Cold Start Durations Across Various .NET Versions

Of course, this has been an issue that the serverless community has been dealing with for a while now, and have come with various strategies to overcome the latency issues. Moreover, third party SaaS tools such as Thundra.io constantly providing solutions such as cold start monitoring and warming triggers in an attempt to mitigate the pain that cold starts bring with them.

Unfortunately, none of the solutions are perfect, and this is where Provisioned Concurrency comes into play. On the spectrum of solutions to AWS Lambda cold starts, AWS’ Provisioned Concurrency probably sits closest to achieving the goal of zero cold starts.

The Workings of Provisioned Concurrency

Knowing that the major reason behind cold starts is the time taken to initialize the computing worker nodes, AWS’ Provisioned Concurrency solution is quite simple. Already have those worker nodes initialized!

The concept here is that you can now decide how many of these worker nodes you would like to keep initialized for your time-sensitive serverless applications. These worker nodes will reside in a frozen state with your code downloaded and underlying container infrastructure all set. Hence technically still not using up any resources, the benefit here is a guaranteed response time of almost double-digit milliseconds. This is a considerable improvement compared to the latency creeping into the seconds if not minutes with the .NET example whose cold start durations are illustrated above.

That means, depending on the number of concurrent worker nodes you have, invocations shall be routed to provisioned worker nodes before on-demand worker nodes, thus avoiding cold starts due to the need for initialization. It would thus be wise to provision a higher number of worker nodes for expected spikes in traffic. For example, a movie ticketing system could expect a higher rate of traffic on their site at the time ticket sales of a popular show go on sale as shown below.

If the tickets go on sale at 6 pm, then you would expect a higher number of requests, meaning a higher number of invocations of the function. As ticket sales continue and all the show’s tickets get sold out, you can then expect traffic to drop. Therefore, you would no longer need as many provisioned concurrent worker nodes.

If the provisioned concurrent worker nodes fail to accommodate all incoming invocations, then the overflow invocations are handled conventionally with on-demand worker nodes being initialized per the request. However, overall it is definite that there is an improvement in the latency displayed by your serverless application.

There are various ways to provision concurrent worker nodes. The main method obviously includes using the AWS console itself, or the AWS API. Moreover, with the launch, AWS has partnered with third party AWS partner tools to facilitate the provisioning of these concurrent worker nodes. For example, Thundra.io is one such partner and allows you to monitor these provisioned worker nodes, including the number of provisioned concurrent nodes compared to invocation spill over invocations that get routed to on-demand worker nodes.

There are, however, some limitations to the number of provisioned concurrent worker nodes you can reserve. For example, the number of unreserved worker nodes cannot fall below 100.

Therefore it is seen that provisioned worker nodes can be used to avoid cold starts. Those using Lambda functions no longer need to set up extra triggers or perform code changes to mitigate the latency problem. However, Provisioned Concurrency means the worker nodes are present and ready to take on requests to the application. This leads us to another more philosophical question. Is Provisioned Concurrency cheating?

The Serverless Conundrum

Provisioned Concurrency kills the cold start problem, but alas the ideal serverless dream also suffers collateral damage. After all, serverless was meant to be a fully managed on-demand service. However, with Provisioned Concurrency, neither are these services fully managed for you nor are they on-demand, hence redefining the proverbial definition of what a FaaS service is.

Serverless was built on three great pillars as mentioned below:

pay-as-you-go
fully managed
auto scalable

With Provisioned Concurrency, however, we see a divergence from the first two characteristics. Firstly, you have to manage the number of reserved worker nodes, and secondly, you have to pay for these worker nodes by the hour according to AWS’s pricing plans for the feature.

We may still salvage the ‘fully-managed’ clause of serverless services though. Even though we have delved into some of the management responsibilities of Lambda functions by deciding the number of worker nodes to provision, a large and substantial part of the resource management is still being handled by the cloud vendor.

The only reason we are reserving concurrent worker nodes for the Lambda function is to overcome the cold-start problem. Apart from the number of resources, all other responsibilities are still managed by AWS. Moreover, when the number of reserved worker nodes falls short, we switch over to on-demand worker nodes seamlessly without any intervention from the user. Therefore, we have the capability of overcoming cold starts with a dent to the ‘fully-managed’ clause, but nonetheless, a dent required to fit the serverless model into the practical caveats of the real-world.

Thus we can get over the fact that AWS Lambda functions may no longer be fully managed as per the ideological definition of a serverless service It is a small price to pay to achieve serverless adoption as once envisioned. On the other hand, it is the ‘pay-as-you-go’ trait of the serverless model that takes the greatest hit.

Provisioned concurrent worker nodes incur a fixed cost per hour, irrespective of whether or not the worker node is processing requests. The ‘pay-as-you-go’ trait was one of the biggest advantages to the serverless model, but then again, the cold start problem was the greatest disadvantage. Therefore, we have had to woefully sacrifice the queen to checkmate cold starts. A sacrifice with a cloud of contention.

The issue is exasperated once it is known that the reserved worker nodes do not actually preserve the state. It is already known that FaaS functions are stateless, and many devise workarounds to the state caching problem of serverless functions. Provisioned worker nodes on the other hand, even though reserved, do not preserve the state, and hence still exhibit the issue intuitively expected with on-demand worker nodes. Consequently, what AWS is charging for is basically just the warm-up events, a concept on which Jeremy Daly already hosts an open-source project.

Quoted From Forest Breazeal’s “The hidden cost of Lambda provisioned concurrency pricing”

The community may have to yield a scratch on the fine ‘pay-as-you-go’ trait, but it is believed that the benefits of overcoming cold starts are manifold. Thus tipping the scales in favor of serverless adoption, considering how we have seen that cold starts are in fact an inherent problem of the serverless concept. To see the successful mass adoption of serverless as the community envisioned, there are some minor costs we must incur to our ideal serverless environment and AWS has taken that bold step with Provisioned Concurrency.

If there were anyone who could so poetically capture the opinions and pleas that enwreathe Provision Concurrency, it is the Greek fabulist Areos who once said: “A crust eaten in peace is better than a banquet partaken in anxiety.” We cannot reap the ideal benefits offered by serverless without constantly looking over our shoulders for that dreaded cold start lurking in the uncertainty.

Originally published at https://blog.thundra.io.

Provisioned Concurrency: The Silver Bullet to AWS Lambda Cold Starts

The Inherent Problem

The Workings of Provisioned Concurrency

The Serverless Conundrum

Written by Sarjeel Yusuf