Last January, we published a paper looking at the state of the art in serverless computing and Function-as-a-Service (FaaS) systems, which kicked up discussion and controversy. We were excited about the potential of serverless computing to revolutionize cloud programming, but we were also clear about the limitations of commercial FaaS offerings, which made it difficult or impossible to implement a host of natural applications.
Our main concerns with FaaS a year ago were as follows: (1) limited function execution lifetimes; (2) high-latency and low-bandwidth IO; (3) no inbound network connections; and (4) no specialized hardware (e.g., GPUs, TPUs). All of these limitations were frustrating, but some seemed transient: The maximum execution time limits on AWS Lambda had been increasing from 1 to 5 and then to 15 minutes, and we figured it was only a matter of time before Google made TPUs available to serverless functions (similarly with AWS Inferentia on Lambda). However, we felt that the constraints around expensive IO and direct communication were less obviously fixed and required some careful thought.
Our goal was never to criticize from the sidelines, and over the past year we’ve been hard at work on new designs that address the more fundamental limitations.
Designing a New FaaS System
As we enter the new decade, we’re excited to take the wraps off a new FaaS platform we’ve built called Cloudburst. Our goal with Cloudburst is to tackle what we view as the most fundamental challenge in serverless computing: efficient, low-latency, and consistent access to state. Cloudburst enables three different kinds of state sharing that are infeasible on existing FaaS systems: function composition, direct communication, and access to shared, mutable data storage.
One of the core benefits of serverless infrastructure is the architectural disaggregation of compute and storage, which enables cloud providers to efficiently bin-pack compute tasks and data objects into shared physical resources. This allows cloud providers to increase utilization and leads to lower costs and simpler operations for users. Unfortunately, this disaggregation also introduces the network barrier that leads to the limitations described above.
The Cloudburst architecture is based on the principle of logical disaggregation with physical colocation (LDPC). The core idea is to maintain separate compute and storage resources, but to enable low-latency state access by introducing caches that live on the same physical machines as the computation. The provider can dynamically provision as much compute as needed with onboard capacity for “hot” data, while separately provisioning long-term storage intelligently.
Without getting too far into the weeds, this architecture allows us to enable state sharing in two key ways: (1) Frequently accessed keys in the KVS are likely to be cached locally on repeated access; and (2) network messages and function results can be sent directly from one compute node to another in the common case and can fall back to writing to storage queues as needed. This raises a whole host of interesting challenges around scheduling, identity, cache consistency, and so on. If you’re interested in learning more, we just published a full paper on arXiv that describes the Cloudburst architecture in detail, which you can find here.
Initial results show that we can outperform standard FaaS architectures using systems like AWS Lambda, AWS DynamoDB, and Redis by orders of magnitude on both synthetic tasks (e.g., simple function composition, data-intensive benchmarks) as well as real-world workloads (e.g., prediction serving pipelines, a Twitter-style social network).The graph below, for example, shows Cloudburst’s performance on a three-stage prediction serving pipeline compared to a single-threaded Python process, AWS Lambda, and AWS Sagemaker (a purpose built prediction-serving system). Unlike Sagemaker, Cloudburst was not designed for prediction serving per se; it’s just a faster platform for general-purpose serverless computing. You can find more details and a full evaluation in Section 6 of the paper.
The source code for both Anna and Cloudburst is available in the hydro-project organization on Github. (Cloudburst and Anna are components in a longer-term vision for programming the cloud that we call Hydro. Watch this space for more on Hydro in the future!)
The progress we’ve made developing Cloudburst in the last year has been encouraging, and we’re really only just getting started with our research agenda. We’re beginning to deploy some interesting applications to run on top of Cloudburst, including data science workloads, robotic motion planning, and more complex ML model serving. In the broader scope of Hydro we’re also exploring richer programming models for the cloud, as well as new economic models and autoscaling, scheduling and consistency guarantees in the context of the stack we’ve developed thus far.
If any of those things sound interesting to you or if you have applications you feel would be a good fit, please reach out!