Serverless = Distraction-Free
The notion of “serverless” computing has gradually evolved over the last few years: from not managing servers/infrastructure, to the broader idea of focusing on your business logic — while outsourcing other tasks (like managing servers) to others. TL;DR: it is all about removing distraction.
While this particular idea is not new (think of PaaS, which intended to deliver just that) — a combination of several factors seemed to contribute to the exponential explosion in the popularity of serverless. To name a few:
- Increasing maturity, efficiency, and flexibility of public cloud infrastructure. This makes outsourcing of the infrastructure especially compelling, given that it is rarely a core capability of a company business-wise (so, paying someone that you can trust, who can do it well — often better than you can, actually makes a lot of sense). Infrastructure is a distraction.
- Increasing adoption of cloud-native principles, that encourage decoupling between application components (e.g., micro-services) as well as asynchronous communication between them (e.g., event-driven). As a result, middleware facilitating these patterns (discovery, routing, messaging, evening, etc) becomes a boilerplate — everyone needs it, but no one is really enjoying developing and maintaining it. Middleware is a distraction.
- Emergence (and increasing maturity) of the API Economy — a trend in software architecture where more and more capabilities are delivered (and consumed) as managed (and often metered) APIs, essentially making them a commodity. The plethora of AWS services (all fully programmable via APIs) is a good example. This creates a clear incentive to architect the application as a mashup of API calls, where business logic is essentially dominated by ‘glue code’ connecting between APIs. Commodity services are a distraction.
Application Patterns for Serverless Platforms
Realization of these opportunities in a serverless compute platform often requires to focus on particular application patterns. For different kinds of applications, the notion of distraction-free may mean different things — different kinds of infrastructure, different kinds of middleware, different kinds of services to rely on. Furthermore, when focusing on a particular application pattern, the platform can be optimized, potentially resulting in a compelling mix of convenience, performance, and cost.
Function-as-a-Service (FaaS): AWS Lambda
AWS Lambda is a good example of a serverless compute platform, implementing the ‘function-as-a-service’ (or FaaS) paradigm. Among other things, it provides a fully managed polyglot runtime capabilities (including request-based scaling, scaling to zero, etc), and enables easy integration with multiple event sources (including API Gateway and ELB). The ‘sweet spot’ of Lambda is applications centered around a collection of (not too) short-lived, ephemeral functions, triggered by events or requests, with potentially high load variability (including periods of idleness), as well as (relatively) low sensitivity to latency. Under the covers, Lambda provisions isolated containers for each function invocation, and reuses idle containers based on internal policies (e.g., ‘warm’ containers are typically retained for 15 minutes). Is this the most important or common application pattern? Not necessarily. But the resulting developer experience is so convenient (and cheap, at least in the beginning), that developers fall in love with it, find creating ways to use it, and ultimately even push the limits of Lambda beyond the original design and optimization goals. Sometimes it works well, but in other cases, it makes more sense to use other serverless compute platforms, designed and optimized for other application patterns.
One example of a serverless compute platform targeting an application pattern which is quite different compared to FaaS — is Knative (as well as its managed version — Google Cloud Run). Unlike most FaaS implementations, each container in Knative is designed to serve multiple concurrent requests, with configurable requests rate per container. Also, each Knative service is by design a scale-out, long-running service, which would dynamically grow or shrink by adding or removing containers (including scaling to zero) based on actual load. It is easy to see that the main sources of distraction for developers of scale-out long-running web services are somewhat different compared to those discussed above. Here it is more about rolling out application updates (including advanced techniques such as canary deployment), flexible routing and load balancing, etc — in addition to the infrastructure aspects, which are quite common for many serverless compute platforms.
Beyond Web Applications
So, what other application patterns are common enough to justify the development of a dedicated serverless compute platform, which would be focused on removing distraction for developers of this particular kind of applications? Which building blocks can such platforms reuse from existing platforms, and which capabilities are missing?
In an attempt to answer these questions, we conducted multiple experiments (jointly with graduate students at Carnegie Mellon University). Some of the initial results are summarized in my recent talk at the KubeCon/CloudNative Europe 2019 (abstract, slides, video recording), and have been presented at SYSTOR 2019 (abstract).
Stay tuned for their coverage in future blog posts!