The challenges of AWS Lambda in production

Solving issues you may face when deploying Lambda at your company

Published in

Sinch Blog

13 min readApr 12, 2021

Lambda was first introduced by AWS in 2014 and soon it started being hailed as the pinnacle of microservices, the ultimate unit of deployment, the newest innovation that would make traditional managed IT look like the stone age. The catchy serverless terminology was given. The buzzwords were marketed. The hype was set.

Lambda was quick to amass a crowd of early adopters, we developers are particularly good at chasing the shiny newest tech with the goal of boosting our productivity and stepping up our engineering game. I was curious. In conferences, I’d always choose to attend the serverless talks and check what it was all about, wondering when I’d have the chance to try it out at my company.

Well, these last years I did just that!

These are the experiences I’ve garnered along with our amazing team at Wavy Global (now part of Sinch) during the development of several projects. They were developed entirely on AWS Lambda, with no on-premises servers, allocated VMs, or containerized services, which exposed us to a wide variety of technical challenges I will outline in depth here.

When to use Lambda?

Some of the selling points of AWS Lambda are it’s low compute price, on-demand down-to-zero scaling, and of course, not having to manage server infrastructure.

It’s a flexible compute runtime that can handle a variety of tasks, here are a few typical examples:

Offloading resource-intensive workloads from an existing service — e.g. processing big files;
Running once-a-day jobs in parallel — it can be triggered with a cron schedule;
Building cloud-native reactive services — e.g. receiving an S3 file upload notification;
Performing ETL on streamed data with SQS or Kinesis;
Exposing existing APIs but transforming requests/response formats.

FaaS in general is smoothly integrated to the cloud provider’s offerings, making it incredibly easy to setup a new service and just focus on building your product.

That said, this article will give us a clearer technical perspective on the pitfalls of serverless development and how to avoid them, which contributes to a better picture of when to use Lambda than the marketing bullets mentioned above.

Instance Lifecycle

I’d like to begin by expanding on the implications of “not having servers to manage”. Since the execution of functions is completely dependent on AWS, there are a few details we should know in order to properly develop code that works well with the runtime. The underlying instance orchestration has a few idiosyncrasies that aren’t commonly found elsewhere, as shown in this picture.

When a function is triggered, AWS downloads your code and creates a new instance to process the incoming event. This initialization is called a cold start. It can take quite some time depending on application size, dependencies, and startup logic (establishing connections, loading resources or runtime configurations).

Next, the instance starts processing the event in what we call a hot state, unlike a typical application, the function instance can only handle a single event at a time, and if it’s triggered concurrently, AWS will spin up another instance — I’ve seen quite a few developers with misconceptions about this, it doesn’t matter if your code is perfectly pure, designed to handle concurrent invocations on the same instance and you allocated 512MB or 1GB for the function, all of that memory will be dedicated to a single invocation at a time.

Once the event has been handled the instance becomes warm, this state means it has been initialized but there are no events being currently processed, so it’s ready to accept incoming ones. Warm instances are reused across invocations so we don’t have to pay the price of a cold start every time our lambda function is invoked.

It’s up to AWS to decide how long warm instances will stay up, it can be anywhere from 20 minutes to an hour, but there are no guarantees unless you specify a provisioned concurrency to have a number of warm instances always available.

Managing DB connections

Considering that the same function instance can be reused between invocations, we should avoid reinitializing resources inside the handler function, and instead reuse the same resources across invocations.

A common kind of resource eligible for reuse is a database connection. Establishing a new connection with RDS can take up to 200ms, and that’d be an expensive cost to pay for every execution. On NodeJS, the way to get around this is by setting the option callbackWaitsForEmptyEventLoop to false and then either just leaving the connection open or using a Pool limited to 1 connection.

The effect of this option is quite self explanatory: even if your handler returned (or performed the callback) with the result of the invocation, the Lambda NodeJS runtime waits until the event loop is empty before sending out the response. This is done by default in order to guarantee any pending operations will be finished before the environment is frozen.

We should also pay attention to the consequences of switching this option. Code that worked previously could be broken as shown in the following example.

The HTTP request at line 6 isn’t guaranteed to resolve

There’s a pending HTTP request that may not finish because the runtime will be frozen as soon as the control flow hits return.

Besides errors as simple to detect as the one above, we may also run into more subtle ones that could slip by tests. Since the runtime may carry resources from a previous run, we could potentially cause resource leaks as the instance is reused across invocations. We once had an issue in which a rarely used function had the following code:

Find the resource leak

This function will open a new DB connection every time it’s invoked. The mistake here is that we’re never closing it with client.end(). This error would be easy to detect during tests without disabling callbackWaitsForEmptyEventLoop since the execution would simply hang and time out due to the open socket, but in this case, old connections will just pile up as the function is invoked.

Resource leaks can happen in any application and as you can see, we have to be mindful of them in Lambda too.

Moving on, you may have noticed we’ve successfully solved the problem of opening and closing connections all the time by reusing them across invocations, but now we’ve created an even bigger problem because these instances will sit around with idle connections until they are scaled down, oh well…

DB connection limit

One of the most common problems faced — if not the most common — is reaching the connection limit of relational databases as your lambda invocations increase.

This is quite unfortunate, you have a default limit of 1000 concurrent invocations (which can be increased) but smaller RDS instance types are limited to 100–200 connections in their stock configurations, so they aren’t able to keep up with such a number of clients performing queries at the same time.

Now, don’t get this wrong, a single MySQL or PostgreSQL node is very powerful and can handle a huge number of transactions per second, but they don’t expect you to hold idle connections or establish new connections at a high frequency as is done with Lambda. The problem here is trying to combine a horizontally scalable thing with a vertically scalable thing.

There are a few main ways to address that issue:

1. Vertically scale your DB

You may be able to allocate more physical resources and fine tune your relational DB server. A general purpose db.t3.xlarge PostgreSQL instance can handle up to 1800 connections, for example.

What makes this approach uninteresting for our scenario is that it goes against the serverless ethos of scaling on demand and paying as you go. If the load is low, you’re paying for unused hardware. If the load is unusually high, it won’t autoscale.

2. Limit your queries to the capacity of the database

In read-heavy workloads, you may be able to lower database usage by caching results on API Gateway or even Redis (which can accept up to 10k connections), but that could introduce it’s own set of problems: reading stale values, invalidating changed items, handling change of schemas in cached objects, or dealing with cache stampedes.

In write-heavy workloads, you may setup an SQS queue to perform batch inserts at a fixed rate by setting a reserved concurrency in the consumer functions, if application logic allows for it (sadly, it often doesn’t).

3. Use a NoSQL database

You may also choose to use a document store such as DynamoDB, which allows you to specify read/write capacity units beforehand or to simply autoscale depending on what suits you best.

Indeed, an autoscaling DB would be more aligned with the serverless model rather than paying a fixed amount for idle computing power you won’t use most of the time. Moreover, Dynamo Streams offers Lambda triggers out of the box notifying when a table’s content changes, allowing you to build reactive event-driven architectures without worrying about the plumbing, AWS does it magically for you.

DynamoDB should allow you to handle a significant amount of concurrent invocations without bringing down your service, but switching an ACID DB for a document store with eventually consistent reads is simply not a option in some cases. Onto the fourth option.

4. Use RDS proxy

The fourth option, which has been generally available since June of 2020, is to use RDS Proxy, a service designed to act as a connection pool for serverless applications.

In traditional applications, we may handle hundreds of concurrent requests at the same time whilst only holding a few DB connections that are shared between requests as they need to perform queries during their execution. In Lambda we can’t really do that because each invocation lives in it’s own separate runtime environment, so sharing sockets isn’t possible.

RDS proxy solves this by being the middleman, holding a pool of connections to the database and allowing clients to connect to itself, multiplexing real connections transparently when your functions need to perform DB operations.

This diagram compares connection handling in three different scenarios.

In the left, the service has a traditional connection pool;
In the middle, the service inefficiently leaves an open connection for every instance;
In the right, RDS proxy acts as a connection pool for the serverless application.

5. Use Aurora Serverless v2

Aurora is a distributed relational DB provided by AWS that forks MySQL/Postgres and reworks a bunch of things by “moving logging and storage out of the database engine”. In doing so, it leverages other AWS services to add HA capabilities and improve performance in a few scenarios. If you are interested, the Aurora paper details their architectural achievements.

Aurora Serverless, as its name says, is a spin-off that builds on the separation of database engine and storage layer to provide instance autoscaling at a minimal cost. Oddly enough, AWS started working on Aurora Serverless v2 in order to overcome some limitations of the initial version, which couldn’t scale fast enough.

Sadly, v2 is not in GA at the time of writing, so we’ll have to wait in order to use in production, but once it is, it promises lightning speed on-demand scaling that can keep up with real world usage.

Other shared memory concerns

We could extrapolate the problems faced interacting with databases to pretty much anything that involves an always-open TCP connection, such as HTTP connection pools and dealing with message brokers.

Or take for example a circuit breaker library, which is usually implemented by relying on shared memory to store, lookup, and react to the health of downstream services. We’d have to rely on a service mesh or share state via redis to implement it.

I’m sure you can think of more examples. This boils down to the fine grained scaling represented by the 1:1 correspondence between invocations and instances. The lack of a shared runtime environment inevitably leads to excessive resource usage. To get around this, Lambda needs training wheels as is the case for RDS proxy.

Because of this, I’ve seen more than a few comparisons of serverless to the old Common Gateway Interface, in which the web server forks into a separate processes to generate web pages dynamically for each HTTP request. This is not a fair comparison because the forking part — cold start — only happens for a tiny fraction of invocations, but the resource usage caused by spawning a new OS process is analogous to that of a Lambda container instance.

Latency

When laying out a Lambda-based architecture, a priori you would plan a bunch of scattered Lambda functions with a single responsibility each. These functions would then talk to each other to perform the greater task, just like microservices.

And just like microservices, if you were to do that you’d end up with the infamous latency problem. A web of services communicating via network will incur a greater cost than a monolith performing local function calls and accessing it’s DB directly, that’s a fact of life.

I like to stay vigilant when it comes to performance. 100ms spent here, 200ms spent there, and that stuff adds up to the end user. API Gateway alone takes 100ms just to invoke your lambda upon receiving a request. An empty SQS queue can take 2 minutes to deliver a message. Over time I’ve become more considerate of latencies when performing integrations.

If I were to build a new architecture today, I’d rather have as few hops as reasonably possible in the functions I create, which would not only amount to lower integration latencies between them, but also have the added bonus of lowering MTTR because we’d have simpler stack traces to follow instead of distributed tracing across a hundred things.

So in the end, our functions tend to grow a little too big in order to stay performant and maintainable. My advice is to scratch the arbitrary rules that a function should have X lines of code or perform Y things. Instead, focus on what matters to your team, even if it doesn’t look like a sexy architecture on a PowerPoint presentation.

Infinite scalability?

Lambda functions will autoscale almost indefinitely, and that is a blessing because it allows you to seamlessly handle invocation spikes, cruising through the highs and lows of the day without paying for more than the necessary.

On the other side, the large amount of function instances may start putting so much pressure on your architecture until a pipe bursts somewhere. That pipe could be database connections, an internal service being overwhelmed, or a third party API rate limit being reached.

For this reason, you might consider whether or not your functions should have a maximum number of concurrent invocations configured (they should). This is usually controlled by either setting a reserved concurrency directly on the Lambda function or limiting the trigger, e.g. throttling requests via API gateway.

As always, stress testing your serverless architecture should give you an idea of what workloads you’re prepared to deal with and what constraints could be improved. This is common practice for traditional applications and serverless is no exception, after all, who knows what bottlenecks lie in your architecture? Maybe you could grow your maximum load by 10x with small tweaks over flaws revealed by stress testing.

The architecture’s scalability is always bound by something, even if it’s AWS itself. As long as you know what it is, and you’ll be good to go.

Infrastructure as code / Frameworks

How should you manage Lambda IaC? Well the most common answer is to use the serverless framework. It uses an YAML file to describe things such as IAM, AWS region, lambda triggers, allocated memory, and much more. With a single serverless deploy command, it’ll create the functions if they don’t exist already, assign your configurations, package, and deploy the code. The front-end folks are having a lot of fun getting their entire projects up with a single command, we can do that too with serverless.

If you prefer a tool officially supported by AWS, SAM should offer roughly the same features. It’s similar to serverless in the sense that it creates a Cloudformation stack to manage project resources, but it’s not exactly the same.

Then there is Serverless Stack, it’s a relatively newer framework that has been looking very promising. It innovates by providing a Live Lambda Development feature that tunnels traffic to your local machine, offering a tight feedback loop when you are coding and testing. It also allows you to specify your app in TypeScript as opposed to a YAML file by using AWS CDK under the hood. I’m definitely gonna try it out as soon as I get the chance.

Some people have been using with Terraform in order to manage their serverless infrastructure as an alternative to Cloudformation based solutions like the ones mentioned above. Although it’s not very commonly applied to Lambda, I could see that making sense if the same project includes external infrastructure components, or if Terraform is a company wide tool for IaC, or even if you want to avoid silly Cloudformation limits, but be careful to not introduce unnecessary complexity.

Here’s an example of unnecessary complexity I’ve seen in the wild: the authors of a project had the need to create 40 API endpoints in a project, resulting in a lambda for each, and their infrastructure team decided to create separate repositories and multibranch Jenkins pipelines spanning across 3 deployment stages, which amounted 40 repos and 120 pipelines, so it got out of hand pretty quickly.

Please don’t do this.

Do not create separate Jenkins pipelines and Terraform projects because it has worked in the past with non-serverless applications. Lambda services tend to grow their number of functions over time as new features are developed and their deployments often involve updating several functions, which favors IaC that manages all of them at once.

Remember, one of the advantages of Lambda is to just ship the code without having infrastructure management get in your way, so this is definitely something you don’t want to overengineer. Creating functions or updating their configurations should be easy and quick. In general, a framework-aided monorepo will do pretty well for functions that belong to the same service and/or are exposed under the same API Gateway.

Conclusion

We’ve only explored a small set of lambda features and difficulties, there’s a lot more to it than presented here, we could look into layers, step functions, the myriad of triggers, monitoring, observability, canaries, testing, and so on.

All in all, Lambda is an incredible tool for fast development and prototyping. It allows you to get an API up and running with negligible friction, leveraging other AWS offerings to create a minimally viable but powerful architecture, provided that you’re mindful of it’s limitations.

In case your product outgrows Lambda in the future and you have to migrate to containers or VMs, I consider that a win because Lambda allowed you to quickly get there at a low operating cost with the possibility of failing fast in the first place, and that’s what matters.