The JAWS Stack: The Power and Peril of Infinite Scalability
Look Ma, no server!
No IaaS EC2 instances to manage and maintain. No idle CPU cycles. No PaaS environments to configure and deploy. No monolithic code deployment, or failure.
Your application lives as a collection of stateless micro-services implemented as AWS Lambda functions.
JAWS By Another Name
The first commit in the JAWS Stack github repo was only four months ago, but it already has a small but passionate community [with a JAWS Gitter room] that has produced a roadmap and specification for JAWS V1.0.
The Power of JAWS
Your JAWS application “server-side” is a collection of micro-service functions that only cost money when a function is actually running — no more idling instances. In development you pay micro-cents.
And then as you ramp up your load with thousands, then tens of thousands, then hundreds of thousands, then millions of users, the Amazon AWS infrastructure handles the scaling automatically. Like magic. And you pay for your cheap-as-chips Functions as a Service (FaaS — I made that up too) out of all that recurring subscription revenue you are collecting, or out of your VC funding if you are a pure market-share play, or from your day job or savings if you’re doing this out of your garage.
Adding Stateful Workflows “Server-side”
I added Amazon’s Simple Workflow Framework (SWF) to the mix, because my application is purely server-based (it’s a middleware thing), and there is no browser-side component to maintain or orchestrate a workflow state.
Amazon SWF provides a state management service that passes a workflow history to stateless “deciders” who determine what to do next based on that history.
To implement my deciders in AWS Lambda I used swf-lambda-decider, and triggered them by connecting them via Amazon Simple Notification Service (SNS) to Eric Hammond’s Unreliable Town Clock — a community service 15-minute pulse heard around the world.
The workers normally poll the SWF workflow service for tasks, and Lambda functions cannot poll because they need to be invoked. But the latest release of SWF and the AWS SDK allow deciders to directly invoke Lambda worker functions. Cloud-based functions invoking other cloud-based functions.
Infinite scalability, combined with the ability of cloud-based functions to invoke other cloud-based functions.
What could possibly go wrong?
Houston, We Have a Problem
I had an error in a decider, and an error in my polling logic.
I’ll spare you the (embarrassing) details. The outcome, however, serves as a cautionary tale.
I had two feedback loops: between my polling logic and my decider, and between the decider and a lambda worker function. The result: 800 functions invoked in under a second, each one spawning another one.
I found out because I had my functions reporting on what they were doing up there in the cloud via Amazon SNS connected to Slack. The first sign that something was wrong was my browser hanging, as it got hundreds, then thousands of reporting messages in the Slack channel.
Then my machine goes into swap death as the browser chews up all the available RAM. I manage to kill the browser, and reopen it.
Do you know how many seconds it takes to open a browser window and delete a workflow execution from the AWS console? Too long.
Even with the workflow execution halted, and no new functions being spawned by the initial feedback loop in the poller, the running functions continued to invoke other functions.
The only way to kill that was to delete the Lambda functions themselves, to halt further invocations.
And deciders, by design, hold their HTTP connection open for 60 seconds. So each of those thousands of suckers was running for a full minute, even after I deleted the lambda functions.
Amazon has a rate limiter on Lambda functions, and throttles them when they get out of hand. I’m not sure at what point, or for how long , or if that is configurable— but it’s definitely something I’ll be investigating.
The Perils of Infinite Scalability
I remember a guy who left the EC2 instances up after a demo involving clustering at a conference. The talk ended, and everyone went to the pub. At the end of the month he got a “massive” bill (which the company paid for — luckily for him).
With Lambda functions scaling limitlessly, you don’t have the overhead of an EC2 instance — which means you don’t pay for execution cycles when you don’t use them, and there is no ceiling for you to run into when your application scales out of hand.
Instead of sending the execution environment into swap death, it’s more likely to send your bank account into overdraft.
Don’t let that stop you. I’m not letting it stop me.
But: with great power comes great responsibility. Practice safe coding out there on the Internets.