The State of the Serverless Art

Joe Hellerstein
Published in
14 min readAug 13, 2020


A photo of the cloud from
Serverless computing is beginning to deliver on the vision of allowing developers to program the cloud.

Over the past 2 years, the Hydro team I lead at Berkeley’s RISELab has been running hard and fast in the area of serverless computing. We designed and built a stateful Functions-as-a-Service platform called Cloudburst. I like to say “Cloudburst puts the state into the state-of-the-art” in serverless computing.

In addition to the software prototype, papers on Cloudburst and serverless computing have been rolling out at quite a clip:

In this post I go into the background of the serverless computing space, and how we got to where we are today. For better or worse, this is a long post. If you want to skip the background, you can jump straight to descriptions of the new results.

Programming for The Biggest Computer Ever Built

I got interested in serverless computing because of my ongoing obsession with techniques for programming the cloud. Why obsess about that, you might ask?

To put a fine point on it, the public cloud is the most powerful general-purpose computer ever assembled. Forget the race for the “biggest supercomputer”, secreted away in a government lab. The cloud is orders of magnitude bigger, and growing every day. Even better, it’s not locked up in a lab — anyone can rent out its power at any scale, on demand. Everyone in computer science should be excited about this, as it’s arguably the biggest game-changer in computing access since the rise of the PDP-11 and UNIX in the early 1970s.

Unfortunately, raw computing power does not translate directly to useful computation. The cloud is not just massive, it is also a data-rich distributed system, which raises notoriously difficult computer science problems, including parallel programming, mid-flight failures of participating nodes, and data inconsistencies across distributed machines. For general-purpose cloud programming, developers today are forced to (a) write sequential programs to run on each machine they want to use, (b) ensure that code works together in concert to achieve desired outcomes in the face of the core CS problems described above, and (c) figure out how to deploy and manage all that complexity. As a result, it is very difficult today for developers to harness the power of the cloud at scale. To continue the analogy, the cloud is like the PDP-11 without UNIX and C — we’re programming it with the distributed systems equivalent of assembly code (though honestly it’s far harder than that from a correctness perspective).

Background: Where Did a Decade Go?

One of the reasons we’re moving so fast in my Hydro team recently is because my students and I have been beavering away in this space for over a decade at Berkeley. Ten years ago this fall, at ACM PODS 2010, I issued a call to arms in a keynote talk:

It is now within the budget of individual developers to rent massive resources in the worlds’ largest computing centers. But … this computing potential will go untapped unless those developers can write programs that harness parallelism, while managing the heterogeneity and component failures endemic to very large clusters of distributed computers.

Given that imperative, I assembled the BOOM project team back in 2010 to explore and demonstrate new ways to write programs. We started designing programming languages like Dedalus and Bloom that use the data in the cloud to drive computation, rather than worrying about which computer is doing what and when. Our early message was not lost on the tech press, which covered the ideas and flagged the promise of our work quite a bit.

But the agenda of general-purpose cloud programming got surprisingly little uptake in the ensuing decade, either in practice or research.

In retrospect, the likely distraction was easier money. Amazon Web Services spent the better part of the ‘teens demonstrating that well-capitalized firms could disrupt the enterprise software market without third-party developers or radical new software. Forget cultivating an iPhone-style “app for that” developer community! It was easier to go after aging giants like Oracle and IBM, and offer traditional software to traditional use cases, exploiting the radical new platform solely to lower administrative overheads.

And so a decade went by, and we wrote a bunch of papers, built some prototypes, and graduated some new PhDs. We felt pretty excited about the work, and we got plenty of academic recognition. But as the old joke goes, “if you’re so smart, why ain’t you rich”? I have to admit that Jeff Bezos made more money on AWS in the last decade than I did at Berkeley doing research. So to be clear, I’m not arguing that the hundreds of billions of dollars of “boring” cloud revenue was a bad play for businesses.

Nonetheless, the deeper technical revolution in cloud programming still awaits. Now that the cloud market has real competition, and the on-premises software market is back on its heels, we’re entering a new era where enabling the new stuff is going to matter.

Commercial Serverless: FaaS

As part of that new era, the cloud vendors have finally made some moves to empower developers outside their walls. The moniker they’ve chosen? Serverless computing. It’s not my favorite term, but it will have to do for now.

In its first incarnation, the idea of serverless computing has been embodied with an API called Functions as a Service (FaaS). As expected, Amazon was first with their AWS Lambda offering, but Microsoft Azure Functions and Google Cloud Functions followed quickly. The idea is simple: a developer writes a function in their favorite traditional programming language. They then upload the function to the cloud, and are given APIs to invoke the function remotely at will. Whenever data arrives at the function input, computation spins up in the cloud, and the result is passed to the output. The developer spends zero time configuring servers. The cloud resources auto-scale up and down dynamically according to usage, and the developer pays as they go, according to that usage.

To be clear, FaaS is only a first step in cloud programming. It is targeted at launching single-threaded sequential code in traditional languages, i.e. the “assembly language of distributed programming” I mention above. Still, while programming may be rudimentary, at least I don’t need to be a cloud devops wizard as well! And I only pay for what I use. That is, without question, progress.

In late 2018, a bunch of us in the RISELab at Berkeley started looking at serverless computing. The systems folks in the lab began a writing-by-committee effort to describe the movement of this bandwagon in one of their “Berkeley View” assessment papers. Having already spent a decade thinking about the future of cloud programming, I had stronger opinions. As a counterpoint to the committee effort, my team laid out our frank assessment of the basic pros and cons of first-generation FaaS in a paper entitled Serverless Computing: One Step Forward, Two Steps Back. In a nutshell:

  • Forward: Autoscaling. Third-party software is automatically scaled up and down according to usage patterns, in a pay-as-you go manner.
  • Back: Slow Data Access. Serverless functions see embarrassingly high-latency and costly access to stored data.
  • Back: No Distributed Computing. Functions are not allowed to communicate with one another except through high-latency storage, making most distributed computing techniques impossible.

Some folks, especially at the orange website, cast the article as a hit job from clueless academics. But the Morning Paper, which has followed our work since the beginning, got the spirit of it:

[this is ] an appeal from the heart to not stop where we are today, but to continue to pursue infrastructure and programming models truly designed for cloud platforms

Also I like to think we’re not totally clueless (nor totally academic). While writing that paper we were already moving forward, getting past the challenges that the first-gen serverless offerings had dodged. In the papers and prototypes we’ve released since then, we are demonstrating what’s possible.

Stateful Serverless Infrastructure 1: Storage

In the early days of the RISElab, we wanted to demonstrate that the lessons of the BOOM project — notably avoiding coordination in the style of the CALM Theorem — could be realized in a high-performance system. So Chenggang Wu set out to build a key-value storage (KVS) database called Anna that embraced and extended those lessons.

The first goal of Anna—and the name of the original paper—was to perform well at any scale. What did we mean by that? Well, conventional wisdom said that systems have to be rearchitected every time they expand 10x beyond plan. Anna was designed to demonstrate that the lessons of coordination-freeness could result in a system that offered world-beating performance at the small scale on a single multicore box, and at massive scale on machines distributed across the globe.

The Anna story is richer than just the any-scale story. Anna is the subject of two earlier posts of mine (here and here) and two award-winning research papers (ICDE18 and VLDB19), and given the length of this post I’ll be brief here, focusing on technical headlines:

  • Anna is crazy fast. In simple workloads Anna is as fast as anything around at any scale. Under contention, Anna is orders of magnitude faster than the fastest KVSes out there, including Redis, Masstree, and Intel’s TBB hashtable. This is because Anna never coordinates (no locks, no atomics, no consensus protocols!), whereas those systems spend 90+% of their time coordinating under contention.
  • Anna offers flexible autoscaling. This is the hallmark of a good serverless infrastructure: scales up when you use it hard, scales down to save money and power when you don’t. Again, coordination-freeness is key: there’s no need to maintain distributed membership information, so the cost to add or remove nodes remains low at every scale.
  • Anna provides rich data consistency. Even under parallel and distributed execution, Anna can offer various consistency guarantees to allow programmers to reason about data across machines, including powerful classical notions including causal consistency or repeatable read transactional isolation.
  • Anna provides unified caching/tiering. Many KVS systems today are designed for one level of storage: either disks, or RAM. In contrast, you can deploy Anna as a caching tier in memory, as a database on disk, or as a multitiered system with a smaller cache on top of a larger database. Anna moves data up and down the tiers, and provides uniform consistency guarantees across both.

There is no storage offering from any cloud vendor today that compares with what Chenggang has done with Anna. I believe Anna identifies and can fill a significant hole in the current cloud architectures.

Stateful Serverless Infrastructure 2: Stateful Compute

As Anna was maturing, we were ready to move up the stack and contemplate programming. As our first phase, we decided to try and build a FaaS system that tackles the “two steps backward” that plague the commercial FaaS services. This means two things:

  1. Allow cloud functions to communicate with each other over the network. Commercial FaaS systems prevent 2 functions from communicating directly; they have to share any information via some slow distributed storage system. This is true even for simple stuff like passing the results of g(x) to another function f so you can compute f(g(x)). Beyond the basics, fast point-to-point communication is absolutely essential if you hope to do any non-trivial distributed computing other than batch jobs. The potential problem here is that serverless functions come and go pretty often, so their IP addresses aren’t reliable endpoints. This is solved with a classic level of indirection: a lookup service, implemented as some kind of lightweight distributed database. DNS is arguably too heavyweight to deploy for this setting, which is perhaps why the cloud vendors refuse to support networking for FaaS. Fortunately we have Anna—a lightweight autoscaling database. So functions can look each other up by “name” in Anna, and get a current IP address for that name. In a direct sense, Anna serves both as a database and as a Distributed Hash Table overlay network, a duality we explored years ago.
  2. Provide cloud functions with low-latency data access (LDPC). All the interesting challenges in distributed computing begin with data, or as some people like to say, the state of a program. Commercial FaaS vendors are targeted at stateless programs that simply map inputs to outputs with no “side effects” like data updates. But most applications of note these days manage data (state), often in complex ways. Adding to the complexity here is the trend towards disaggregation of storage from compute. In a big cloud environment, you don’t know when and how you need to scale out or upgrade your storage tier or your compute tier, so it’s best to keep them separate. The challenge is that storage services like DynamoDB or ElastiCache become very “far away” in latency terms. To get good latency, we still want some physical colocation of storage near our functions, even if the two tiers are managed and scaled separately. This is what we call Logical Disaggregation with Physical Colocation (LDPC). On this front we needed to innovate, and colocate a data cache on the same machines as the cloud functions, while still providing consistency in concert with Anna.

This is where a lot of our energy has been spent in the last year. I’ve learned a lot along the way — while the programming problem remains, the system infrastructure space was interesting in its own right, and I think we got a good handle on the big issues. Here is a rundown of the recent results:

  • Cloudburst System Architecture: The big ideas, overall architecture and some of the details are spelled out in our VLDB 20 paper on Cloudburst. We argue for the LDPC principle and describe the resulting architecture. Then the paper goes into detail on how we automatically encapsulate a developer’s mutable Python state in coordination-avoiding, composable lattice structures so arbitrary Python objects can be integrated into the coordination-free consistency model of Anna. We also describe how we achieve a simple version of causal consistency through these caches. Microbenchmarks show that we can outperform commercial serverless platforms by 1–2 orders of magnitude, and compete with hand-managed serverful distributed frameworks like Dask. We also show end-to-end numbers for two applications: ML prediction serving, and the Retwis twitter clone. Although we did nothing special to tune for ML prediction serving, we outperform AWS Sagemaker, a system specially designed for the task. (We also outperform AWS Lambda by quite a bit more.)
  • Hydrocache and TCC: The Hydrocache paper in SIGMOD 2020 delves deeper into the ways we keep caches and the database consistent, while still providing low latency. We set the consistency bar even higher in this paper, with the goal of offering transactional causal consistency (TCC). You do not get this level of consistency from the typical distributed caches or KVS systems (looking at you, Redis Cluster!) Yet we show it can be done with very low latency. There’s no question that this paper is quite technical, though. Enjoy :-)
  • Atomic Fault Tolerance (AFT): The question of fault tolerance should be on your mind when reading about any distributed system. The FaaS vendors are quite naive about it right now — they tell developers that any function may fail and have to be retried, so it’s up to the developer to ensure that their code is idempotent, meaning it has the same effect whether run once or more than once. That’s not very nice, nor is it very likely to be guaranteed. (OK pop quiz time. Stop what you’re doing. Did you write any code this week? Cool. Is it idempotent? How do you know? Is it reasonable to expect you to worry about that? I thought not!) But it gets worse. If your function modifies stored state (say by issuing a call to a database), and it fails a fraction of the way through execution, it will have visibly run a fractional number of times. That is, the partial execution of your function is now exposed in storage and may be read by other code. This paper points out that what’s needed for FaaS fault tolerance is Atomicity, i.e. the “A” from the ACID guarantees. All your function’s external effects should occur, or none should. Idempotence then becomes easy — just include a unique ID for the request, and regardless of how messy it is, we can run it 0 or at most 1 times. That’s how idempotence is supposed to be exposed. This paper leans on our prior work on Read Atomic isolation, and provides a surprisingly simple implementation as a “shim” layer that works in any FaaS architecture. We have it running in the Cloudburst/Anna stack, but the paper shows how to deploy it in the AWS Lambda/S3 stack.
  • Model Serving. Our first foray into model serving in the VLDB 20 Cloudburst architecture paper whet our appetite to do better. A few years back, when my co-conspirator Joey Gonzalez was leading the Clipper model serving project, I needled him by saying “hey I think all these optimizations you’re exploring — cascades and ensembles and whatnot — could be written as simple Bloom programs”. And I proceeded to sketch them as dataflows on a whiteboard. Well, with the Cloudburst infrastructure under his belt, Vikram Sreekanti took up that idea and made it real. He implemented a simple dataflow language called Cloudflow, and deployed it over Cloudburst. Then he proceeded to explore optimization opportunities exposed by the combination of explicit dataflow and stateful serverless computing, including things like (a) placing code on the right HW resources (i.e. GPUs) or colocated with the right data (i.e. in a Hydrocache), (b) autoscaling different stages of an ML pipeline differently, (c) fusing operators so they run colocated with each other, and (d) running competing versions of operators in parallel to let the fastest execution win. What’s really nice here is that the ML code remains a black box, so this is compatible with your favorite ML libraries (Tensorflow, PyTorch, MXNet, Scikit-Learn, etc.) Joey and I feel like Vikram really made the case that this is the right way to architect a model serving system.

In sum, Cloudburst is our answer to the critiques of FaaS we raised 2 years ago. Cloudburst shows that FaaS can provide 3 steps forward, and provide an underpinning for general-purpose cloud programming. Most programming tasks that can benefit from the world’s biggest computer absolutely require efficient and consistent management of program state, and that’s where much of the hard computer science lies in this space.

Summing Up

A photo of the cloud from

Obviously all this work was done by a team. The lion’s share was done by the lead PhD students, Vikram Sreekanti and Chenggang Wu, who are truly a dynamic duo. Joey Gonzalez was my co-conspirator as faculty advisor. Other contributors include Saurav Chhatrapati, Charles Lin, Yihan Lin, and Hari Subbaraj, with wise input from Jose Faleiro, Johann Schleier-Smith, and Alexey Tumanov.

Our ability to slay some dragons in this space in recent years is also thanks to a long line of research from an even bigger group of collaborators from BOOM and P2 days. There’s more to come from our end, and I expect to see more good stuff from the community at large. Programming the cloud is one of the biggest challenges and opportunities in computer science, and we’ll continue pushing forward.

In addition to NSF CISE Expeditions Award CCF-1730628, this research is supported by gifts from Alibaba, Amazon Web Services, Ant Financial, CapitalOne, Ericsson, Facebook, Futurewei, Google, Intel, Microsoft, Nvidia, Scotiabank, Splunk and VMware.