Serverless Pitfalls: Issues With Running a Startup on AWS Lambda
Here at EmailDelivery.com, we’ve made heavy use of AWS Lambda from our product’s inception more than a year ago. In particular, we use the Zappa framework, which makes deploying serverless Python web apps a breeze.
In the process, we learned a lot about what running a website on Lambda means and what one can gain from doing so. We also experienced a few hiccups which we’re going to list here, in the hope that being aware will save others time and headaches.
The Lure of Serverless
On paper, Lambda’s granularity (each function invocation is metered in 100ms increments) makes it capable of some unique things. Need to instantly spin up enough server resources to handle tens of thousands of incoming requests? Lambda can handle that. Then, the moment they stop being used, all those functions will instantly disappear, instead of racking up idle charges. And although Lambda compute time is more expensive hour-per-hour than EC2, when hosting on Lambda, running a web site that isn’t being used costs you nothing. Like, zero.
This is really handy for a pre-revenue startup where every dollar counts. Want to spin up a completely separate version of the site to do some prototyping? No need start a fleet of servers, then months later find out your one day test accidentally never got shut down.
And Lambda’s architecture is simple. Once you have Lambda handling your web requests and your batch processing (using Zappa’s background task and scheduling functions), the whole site just sort of…runs itself. No Kubernetes configurations, no load balancers, no auto scaling rules for the Celery cluster. It’s great when you can actually put time into adding features to the product, rather than setting up all the stuff that makes it run.
Not All Is Well in Lambda Land
We’ve given you something of Lambda’s pitch. Now it’s time for reality.
Does Lambda actually do what it promises? Yes, with some caveats. There are certain things that a Lambda function, as an ephemeral entity, cannot handle. In particular, a Lambda can’t service long-running connections such as a w̶e̶b̶ ̶s̶o̶c̶k̶e̶t̶ or SMTP session [correction: Amazon has added support for web sockets in Lambda]. Also, users should be aware that they will have less control of the platform their code is running on; your OS will be the stock AWS Linux image, period. Zappa can compile and integrate just about any Python package into your Lambda code, but you will still be giving up control, which may make some uneasy.
Then there are the “gotchas,” problems unique to running on Lambda which you will likely encounter at some point:
Functions with less RAM have slower CPU speed
This one perhaps should be obvious, given that it’s listed in the pricing guide (“you choose the amount of memory you want for your function, and are allocated proportional CPU power and other resources”). But from what we’ve seen around the web, this facet of Lambda still trips a lot of people up.
It’s an understandable issue: as a cost-conscious developer, your first instinct will be to configure your functions with the minimum amount of RAM they need (which for most web requests is very little). But doing this will result in seemingly strange performance problems. Some requests will be fill quickly, while others will drag, degrading the user’s experience.
This seems to happen because whatever algorithm AWS uses to divide up CPU slices between Lambdas is not exact. A function with 128MB of RAM should get 1/4 the CPU time of one with 512MB of RAM, but oftentimes if you examine the logs you’ll see these functions finish in 5ms, while other executions can take 500ms or more. We’re sure it all averages out in the end, but it can make for some confusing performance testing if you’re not already aware of it.
The fix here is simple: take both CPU and RAM into account when allocating resources to your Lambda functions, as well as the increased cost that will result.
Also, for short Lambdas, remember that you don’t save any money once the execution time drops below 100ms. Doubling the RAM to go from 90ms to 45ms executions may be good for user experience (in some cases), but since Lambda is billed in 100ms increments, it won’t save you a penny.
Once we increased the resources available to our Lambdas, we saw our execution times go down. Our frontend uses React to make it feel snappy; clicking a link or a button won’t round-trip the server unless it actually needs to fetch or write to the DB. But still, we noticed HTTP requests from the browser were taking around 300ms to return. Not a deal-breaker for our use case, but could we improve it?
The answer, as it turns out, is “not really.”
Since our frontend is hosted on CloudFront (helpful for having those beefy React files load quickly), there is a delay for CloudFront to contact API Gateway, and another delay for API Gateway to contact Lambda. As far as we know, there is no real way to improve that situation. Essentially, if you want <50ms response times, then hosting your backend behind API Gateway is not for you — you need dedicated infrastructure.
AWS gives you two options for where your Lambdas are run: in a VPC, or directly connected to the internet. Odds are if you’re versed in cloud tech, you will pick a VPC by default. After all, it’s more secure and you will have lower latency and no bandwidth charges connecting to other services in your VPC. But Lambdas in a VPC can’t connect to outside services like S3 unless you install a NAT Gateway, which comes with always-on charges — not very serverless-ish [ed. note — as others have pointed out, AWS has S3 Endpoints for that case, though the larger point stands]. In the end, while there may be workarounds, it’s going to be much less of a hassle to either run the site completely outside or completely inside a VPC, which is a rather unfortunate limitation.
Cloudwatch Costs Money
When you get heavily into Lambda-based development, you’ll become acquainted with Cloudwatch quickly, since its the only way (besides rolling your own log solution from scratch) to get a granular record of what Lambdas are running, how many times and for how long — sort of important information.
On drawback of Cloudwatch is that the UI is…well, there’s no other way to say it…horrible. In addition to its basic usability issues, our Cloudwatch console suddenly ground to a halt not long after we began using it, refusing to return the results of any searches for upwards of ten minutes at a time. After coming up empty with AWS support, we employed a workaround: a short script to export logs to S3, download them and collate them by date to put them in the right order.
But making Cloudwatch usable is a double-edged sword, because one thing you’ll find is that its easy to send a lot of data there. The standard Python AWS API module, for example, is configured by default in Lambda to spew a large amount of debug logs to the console, all of which will end up in Cloudwatch. When you’re executing hundreds of thousands of Lambdas, this starts adding up — and the ingestion and storage costs will eat into your budget if you’re not careful to pare them down.
All Lambda Functions Must Be Idempotent
We admit, this one took us by surprise: due to the way their distributed queues work, AWS can execute your Lambdas more than once for the same request.
To put this in perspective, in a “standard” web application, you probably don’t need to worry about whether an incoming POST request got duplicated somewhere down the line. If you get two POSTs, it means the client wanted to create two things, not one. Not so in Lambda, where executing both requests as if they were distinct will end up creating duplicate objects in your database.
It’s important to keep in mind that these duplicate requests are rare events. But when you’re executing enough Lambda functions, rare events will begin to happen with some regularity. There are a few work-arounds for dealing with this behavior, none of them too onerous, but it’s definitely something you’ll need to protect yourself against, especially in code (like billing) where executing the same process twice would be disastrous.
Running Out of Time Can Be Hard to Debug
All Lambdas have a time limit for their execution (defaults to 30 seconds, configurable up to f̶i̶v̶e̶ fifteen minutes [thanks Robert!]). This is good for preventing a deadlocked function from running up charges. But it can also be difficult to diagnose when a function is hanging, because Lambdas that go over their limit simply disappear — there’s no way we know of to throw an exception which would be trapped by our error alert system. You’re stuck looking in Cloudwatch again (sigh…) for Lambdas whose execution time matches your configured time limit.
Executing a Lambda From Another Lambda is Slow
As mentioned above, EmailDelivery.com uses Zappa’s “background task” mechanism to launch tasks which would hold up a pending web request. This is an extremely useful ability which gives you something generally akin to Celery without the dedicated infrastructure.
The drawback, though, is that the Lambda “Invoke” API is not especially fast — call times to launch a function can run around 100ms, even for an asynchronous task. This means if you launch ten functions in a naive way, you’re looking at an entire second’s delay. So much for not holding up the web request!
There are ways to improve this situation. You can launch a task specifically to launch a large number of other tasks, or you can use threads to launch multiple tasks simultaneously. But if you’re looking for the raw speed of launching Celery tasks via something like RabbitMQ, it’s not going to happen without a server dedicated to that task.
The Final Boss: Cold Starts
The above issues are annoying, especially if you don’t know they’re coming (you’re welcome). But, for us at least, they never rose above the level of inconvenience, and certainly they never had us considering moving much of our code off of Lambda. That is, until we encountered the big bad that can’t be killed: cold starts.
Lambda, like everything else in the cloud, runs on servers managed by Amazon. To make Lambda feasible, Amazon cannot keep everyone’s code “warm” and ready to serve requests at all times. This means that if your function hasn’t been run in a while, a pending request needs to wait for it to be initialized before it can be served.
How long a wait? Three seconds is not out of the ordinary for Python Lambdas in our experience, and this doesn’t seem to change based on the resources assigned. This is really bad, because users can’t wait three seconds for frontend requests to load, even once in a while. Once we saw this problem in action, we knew we couldn’t tolerate a user experience like that and we needed a workaround.
But there is no workaround. Zappa has a built-in “keep warm” feature, but this only keeps a single copy of your Lambda ready. Fine for small or toy sites, but once you start to scale up, AWS is going to need multiple copies of your Lambda at once, which means more cold starts. How many copies? It depends on your load, and due to the way email works (people want to send a lot of messages to their customers at the same time, that’s sort of the point), the number of simultaneous Lambdas can vary too much for any custom “warming” solution to eliminate the cold starts completely.
Having realized this, we knew that nothing the end-user touches could be hosted on Lambda after all. We couldn’t have the website suddenly grind to a halt just because something big happened on the backend. Luckily, the fact that Zappa implements the standard WSGI protocol helped us here. We had always been running the Zappa app in Gunicorn for local testing purposes, so we were able to wrap that command in a Dockerfile and move the portions of the API which support the UI into an ECS container in a couple hours or so.
What is Lambda Good For?
We still use Lambda extensively on our backend, and to power parts of our API which aren’t user-facing (but which can be subject to severe loads at times). It’s an easy, cheap way to get a small website up and running, but until Amazon makes some changes to the way its implemented (and we’re not sure such a thing is possible), it can’t be used to actually run an end-user web application at scale.
It’s interesting to see how our relationship with Lambda has changed as the site has grown. We started off viewing it as a temporary convenience, something we could use on the low end but which we would likely outgrow as the dream of going “100% serverless” gave way to what we actually needed to accomplish.
Now, we view Lambda mainly in terms of what it can do as a high-end parallel architecture. What’s possible now that Amazon has given everyone the equivalent of a very accessible supercomputer which can be used in short bursts for relatively little money? How does this change the way we deal with ingesting and searching the massive amount of data email generates?
More detail on that will have to wait for a future article.