Why Container Devs Need to Care about re:Invent 2019’s Serverless Launches
Another re:Invent is over, and an army of weary AWS employees have flown back home, their launch tasks complete. The AWS Serverless teams, including AWS Lambda, Amazon API Gateway, AWS Step Functions, Amazon EventBridge and others were especially prolific in in the run up to, and during, re:Invent. But what does it all mean? With so many new releases, it can be hard to find a forest and not just a bunch of trees. That’s especially true this year, since much of what was launched wasn’t coolness for the already-drank-the-Kool-Aid crowd: This time, AWS is trying hard to win container users over to Serverless. It’s the dawn of a new “hybrid” era.
We’re already seen this phenomenon in reverse: AWS Fargate and Google Cloud Run take containers and try to “serverlessize” them by removing details of the underlying infrastructure management, while preserving the things developers love about containers (like Docker images as a deployment and runtime abstraction). This year, the most pure-play Serverless compute offering out there, AWS Lambda, got in the hybrid game by offering more “server-like” features. Tim’s Prediction: We’re in for more bridge building from both sides from each of the Big 3 cloud vendors!
If you’re a developer thinking about building your next in-house service, public API, or mobile app, what’s different now versus a couple of months ago? If you’re already a heavy user of Serverless, what changed that you critically need to know about after this year’s re:Invent? I’ll decode these releases for both demographics to give you an easy cheat sheet for what to care about. (And for a look at what *isn’t* different, see my upcoming companion article on what AWS teams still need to do to help enterprise developers adopt Serverless architectures.)
First, to re:Iterate the most important thing you need to know: 2019’s AWS Serverless releases were mostly about getting k8s and other container/server developers to come to the Serverless party. Yes, these AWS teams love their existing users, but without this context, you might be forgiven for wondering why features that seem to blunt some of the magic of serverless (like provisioned capacity) made the re:Invent cut. It’s part of the long game AWS is playing: They’re systematically looking at every objection that non-serverless developers raise and going after them, with an eye towards eventually enabling all applications to be serverless.
To that end, the AWS team focused on the top objection that container and server devs level at Lambda: cold starts. AWS decided enough was enough, and went for a full-on mic drop here with a trifecta of solutions that represented their biggest and baddest releases in 2019:
- Zero latency (aka “ENI-less”) VPC access. Don’t know what an ENI is? Don’t worry, it doesn’t matter. What does matter is that scaling up a Lambda function that needed to use resources behind a VPC used to take up to 30 seconds on each initial call to get that network config established. Now that’s more like 30 microseconds on the first call. So if you’ve heard someone say, “Lambda doesn’t work when you use VPCs”; that’s now outdated info.
If you’re not an existing user: PAY ATTENTION TO THIS! You probably had a buddy steer you away from using Lambda because of this issue (or perhaps you did the steering). It’s time to forget those old tapes, because Lambda is now just as fast whether you use VPCs are not. (BTW, there really isn’t any such thing as “non-VPC” Lambdas; it’s just a question of whose VPC it runs in — the Lambda service’s multi-tenanted VPC or your own.)
If you’re an existing user: If you use the VPC feature and your Lambdas are synchronous and they’re latency sensitive, then rejoice. Otherwise, you can ignore this launch except to feel good that if you hit that use case down the road you’re covered.
- RDS Proxies. Heard of Aurora Serverless? It’s essentially built-in autoscaling + an HTTP frontend on an AWS-flavored SQL database. RDS Proxies are basically the latter applied to Amazon RDS dbs. Or in simple terms: If you want to use Lambda with an RDS database, it used to suck (because connection pools and Lambda didn’t play nicely; also VPC) and now it doesn’t suck. Caveat: Still in preview, only MySQL today, etc. but expect this to rapidly get to production grade.
If you’re not an existing user: PAY ATTENTION TO THIS! You probably heard “Lambda doesn’t work well with SQL databases” at some point, so if you didn’t have the time/energy/will to migrate to Amazon DynamoDB you might have ruled Serverless out as an option early on. That’s no longer the case. However, note that SQL databases still won’t “impedance match” for overall throughput as well as a Lambda + DynamoDB combo play will. RDS Proxies will keep Lambda from blowing up your relational database; it doesn’t send magical unicorns over to RDS to auto-scale capacity, so Aurora Serverless and DynamoDB still have more built-in“serverless scalability” if you have the flexibility to choose one of the latter two.
If you’re an existing user: Chance are you’re already using DynamoDB or perhaps Aurora Serverless because they were the preferred solutions. If you are using RDS, you probably have some messy homegrown connection pooling and/or concurrency control mess that you should now throw away in favor of letting AWS do this for you.
- Provisioned Capacity. The ability to tell Lambda that you need N instances of a function, warmed and ready to use, for a given amount of time. Or to put it another way: An SLA from AWS to make up to N simultaneous calls to a given function with a guarantee that they will be 1) available (you won’t hit a provisioning limit) and 2) low latency (you won’t have to wait for Lambda to get them ready). Note that you can still make more than N calls in parallel; you just won’t get the provisioned capacity guarantees for any “excess” calls.
If you’re not an existing user: PAY ATTENTION TO THIS! If there were a classic “I use k8s/containers/servers because Lambda can’t …” lament, this was it. If you needed predictable low latency or predictable burst capacity with Lambda, getting that required scaffolding out your own “prewarmer”. It was a messy job that existing customers vehemently disliked and an obvious gap that caused potential serverless adopters to turn away and head straight back to containerland. Now, getting the combo play of low latency, guaranteed capacity just like you can with servers is as easy as a config setting on Lambda. This is the biggest game changer of re:Invent for folks who aren’t already using Serverless.
If you’re an existing user: Use Lambda for async workloads (S3, SNS) or polled event sources (SQS, Kinesis, DynamoDB streams)? Then you almost certainly shouldn’t care about (and shouldn’t use) this feature. But if you use Lambda for something like a flash sale where you need predictably low latency and/or guaranteed instantaneous high volume capacity, throw away your nasty DIY prewarmers today.
(By the way, lumigo has done a fantastic deep dive writeup on this feature.)
The fine print. #1 and #2 are uniform goodness — everyone who uses them will benefit. But #3 is different — You shouldn’t use it if you don’t truly need it, because it destroys one of the key economic advantages of serverless compute: 100% guaranteed utilization. There’s no way to say this nicely: Paying for unused capacity makes Lambda less serverless. This is probably why the team formally named it “provisioned concurrency”, but don’t be confused: You’re paying for what will probably be a lot of unused capacity, just as you do with servers and containers.
Will provisioned capacity get abused? Absolutely — especially by developers and operators who are used to provisioning servers and really wanted the equivalent functionality in the Serverless realm. Premature optimization, developer hubris, and confirmation bias will undoubtedly result in way more provisioning that is truly necessary. Will this cause developers to defer important architectural changes, such as slimming down dependencies and startup times, where Lambda previously provided useful “design pressure”? Yes, also unfortunately true. But it will also result in hordes of new developers adopting Serverless who otherwise would have spent more years using containers and servers, so it’s a reasonable tradeoff. Bottom line: Think about Lambda provisioned capacity the same way you think about Google Cloud Run, AWS Fargate, and premium Azure Functions: These are all hybrid solutions that aren’t “pure play” Serverless, but they provide an important bridge for meeting a lot of developers where they are today. If it helps you operate one less server, don’t feel guilty about getting started there…just know you’re likely not at the end of your journey yet.
Ok, so much for the cold start trifecta. What else got launched?
- HTTP APIs. It’s basically an API Gateway cost reduction for the common case.
If you’re not an existing user: PAY ATTENTION TO THIS! You probably looked at Amazon API Gateway once upon a time and decided that $3.50 per million API calls was too rich for your blood, and went back to running your own frontend. At $1 per million calls, you might feel differently, and let me just say: Not running your own frontend fleet is a joy that keeps giving.
If you’re an existing user: Any AWS price cut is nice; getting 70% off what might be one of the more expensive parts of your bill is pretty cool. But make sure to look at the fine print: The tradeoff here is that a bunch of the advanced API Gateway features aren’t available in this new “economy seating” variant. If you’re using Velocity templates, Lambda authorizers, or other advanced features, you’re going to be gritting your teeth once you realize this lower price doesn’t apply to you.
- Express Workflows. Run a simple workflow (choreography) at scale and on the cheap.
If you’re not an existing user: You can probably ignore this. It won’t make you change your mind about using (or not using) serverless. It’s aimed at price- and scale-conscious existing users. If you do need/use workflows on AWS, just know there are now two flavors to choose between, not including the (really old) SWF service, which technically makes three.
If you’re an existing user: This might feel like a price cut or “limit lifting”, but in fact the semantics are so different that it’s better to think of this as another service entirely. What is it? Imagine you had a bunch of activities to choreograph, but you were happy to have the whole thingamabob succeed or fail as a unit. You could just code it up as a Lambda function…but there’s this cute workflow language, ASM, and a nice JSON abstraction that Step Functions defined…wouldn’t it be cool if AWS let you run the Step Functions ASM interpreter as a built-in Lambda language? That’s basically what Step Functions Express Workflows are. Now that you understand what it is, though, should you use it?
AWS says that Express Workflows are for programmatic workflows. What does that mean? Let’s say that you have an S3 bucket with a Lambda function set up as an event handler. When a new file arrives, you have a bunch of things to do: Call an API, update a database, put some stuff in a queue and wait for it to drain, etc. But you might get a lot of files (and thus events) firing at once, potentially exceeding either the ability of a “normal” Step Function to launch (rate limiting) or your willingness to pay for it (cost concerns). The new Express Workflows let you launch much faster and for a lot less money. The tradeoff? They don’t offer the reliability of existing Step Functions — they have at-least-once semantics, versus exactly once, they don’t do any internal checkpointing, and they have a 5 minute limit on total running time versus a year with normal Step Functions.
So the big question: Are these limitations worth it, versus just writing conditionals and loops in your favorite programming language? This is somewhat akin to the “CloudFormation versus CDK” debate raging right now: If you love declarative semantics and purpose-built DSLs, you’re going to love Express Workflows. If you think the CDK was a breath of fresh air and you value Intellisense in VS Code above all else, you’re going to find Express Workflows crazy. If you’re seeking practical advice, I’d say the breakeven is when you have more than 3 “things to do” in a function, especially if they involve other AWS services: Less than that, and just writing the code and avoiding the use of another service is probably best. More than that, and you’re getting to the point where your function is dominated by a workflow, and separating the choreography from the computational logic starts to sound like a good 12-factorish separation of concerns.
- Lambda Destinations (and retry controls and improved Kinesis controls, …). Lambda added a whole bunch of controls to make working with event chains and polled event sources easier, safer, more reliable, and more debuggable.
If you’re not an existing user: You can ignore all this; it probably won’t make you change your mind about using (or not using) serverless. Just know that AWS continues to use their operational heritage (and first mover advantage in serverless) to continue breaking new ground in event handling.
If you’re an existing user: PAY ATTENTION TO THIS. You almost certainly have existing solutions and designs that absolutely need to get reconfigured as a result of at least one of the following improvements:
- Customizable retry count (0, 1, or 2 retries; it used to be a fixed 2 retries, for 3 total attempts)
- Retry TTL, to avoid out-of-date processing in the event of an outage or delay
- Kinesis and DynamoDB stream worker fanout (beyond the previously built-in 1:1 ratio)
- Poison pill recovery for Kinesis and DynamoDB streams (the somewhat professorially named, “Bisect on Failure”)
- SNS DLQs (technically an SNS feature, but one which effectively makes the use of SNS for Lambda fanout reliable, since you can hook that queue up to a Lambda function easily)
- Built in FIFO SQS for Lambda
- Event chaining and generalized DLQ enablement, via Lambda Destinations. This is a big one, so let’s drill into some of the cool things you can do with it:
Generalized error handling: Wish you could do something other than just dropping an event into as SQS queue when an asynchronously invoked Lambda has a problem? Now you can (reliably) execute another Lambda function to handle the error instead.
Simple continuations: Not quite done with processing that Lambda function? Call another one when it finishes to carry on! (Sorry, it has to be a different one…no direct tail recursion allowed ;-) Think of this option as the complement of error handling for async invokes: It’s a success handler.
Callbacks: Wish you could get called back when async functions complete without the overhead of wrapping it in a Step Function or having to hack it into your source code? Now you can set up a Lambda Destination to do just that.
Enhanced logging or tracing hooks: Want to record some information when an async function runs without having to change its source code? Turn on Lambda Destinations and hook away.
These are some pretty cool patterns (especially those of the “post invoke hook” flavor) that will take developers awhile to understand and adopt, but they greatly expand the range and capability of event-driven chains using Lambda…for existing users, this is definitely a “sleeper hit” release that will keep on giving!
- CloudWatch Synthetics. What AWS calls “canaries” — customer-proxy monitors that are managed for you.
If you’re not an existing user: It won’t change your mind about serverless one way or the other, though you might find it independently useful for your other microservices.
If you’re an existing user: No need to rush, but you should consider trying this for “outside-in” uptime and latency testing for your key user flows. I’d expect AWS to keep investing here; canaries are a core part of how AWS teams think internally about operational hygiene (and a H-U-G-E part of Amazon’s DevOps culture), so getting in on the ground floor on this feature is probably worth it.
What they are: Like Express Workflows, Synthetics are basically a custom Lambda runtime plus a cron setting plus a little built-in wire-up to CloudWatch metrics plus a dashboard to make it easy to consume. There are a few templates for different kinds of testing. Today, you could do this almost as easily yourself; the managed service just makes it a little easier to get started and monitor in an ongoing fashion. Over time, though, I’d expect AWS to put more muscle behind this, with automation for testing from multiple regions, scaling controls (think artillery), etc. that add value (and which would be increasingly expensive to DIY).
- Event schema discovery and registry. Users of EventBridge can now use it not just as an event bus but also as a service registry for schemas and as a way to dynamically detect schema updates.
If you’re not an existing user: You almost certainly aren’t using EventBridge anyway, so you can ignore what’s going on here. (But see previous comment about AWS being the thought leader on async design patterns for the day when you decide you want some of that.)
If you’re an existing user: If you were wondering how EventBridge intends to differentiate itself from SNS, this release telegraphs the answer: typed messages with strong governance and auditing features at a “semantic” level of abstraction. But forward-looking reveal aside, it’s still early days here, and with neither type checking for Lambda nor a completed open standard for cross-cloud events, the schema discovery and registry features are of somewhat limited utility. Summary: Give this one some more bake time, but keep an eye on it.
- Pro forma stuff. Language updates, SAR author badges, percentiles for duration metrics, X-Ray tracing improvements, improved CloudWatch dashboarding, increased limits for classic Step Functions, and more.
If you’re not an existing user: You can ignore all this.
If you’re an existing user: Take a look at your leisure. The language updates are probably the most important to prioritize.
Other re:Invent 2019 Takeaways
The 2019 launches that weren’t (or weren’t what we might have hoped for) for current and aspiring Serverless developers.
The biggest missing launch: EFS/Lambda integration
You know that moment on the Thursday of re:Invent, around 10:30am or so, when you finally accept the fact that the launch you were hoping for isn’t going to arrive this year? It’s not like seeing the shape of a special present you were hoping for under the Christmas tree.
EFS in Lambda should be a match made in heaven: An effectively infinite disk drive mounted to the world’s largest pool of CPU silicon? Wow, what could be better? Apparently a lot, since EFS integration once again didn’t launch this year. Sad pandas.
The biggest launching miss: AWS Wavelength
I’d love the hassle of deploying, managing, and monitoring thousands of tiny EC2 servers mounted on carrier equipment with the need to rapidly re-route traffic for mobile users…said no one, ever. With the Verizon partnership, AWS finally had the hardware and networking reach to truly do global “edge” functions…but instead they reverted to their most cumbersome, hardest to manage, and least fault- and location-tolerant compute abstraction, EC2. Oh well, there’s always re:Invent 2020.