From Servers to Serverless (long-form)
Note: a condensed version of this article appears on acloud.guru.
Serverless computing, or just serverless, is a new cloud computing paradigm that promises to deliver lower overall development and operational costs, due to a combination of new technologies and pricing methods. Like other pieces of organically coined technical jargon, it does not have a precise, standardized definition, and is used interchangeably to refer to commercial products, open-source projects, as well as aspects of the underlying technology itself. In addition, the term can misleadingly suggest that no actual servers are involved, which is confusing and false.
Instead of trying to formulate one true definition of serverless, this article lays out a possible history of its evolution, seen through the lens of monetary cost savings. In this scenario, there is the loosely-defined developer (a person or company), and a commercial entity called the provider (generally a company). The developer wants to use a certain number of Internet-connected servers to build a new application, and the provider wants to make the developer’s task easier by taking over varying degrees of mundane, lower-level tasks for them, in exchange for cold, hard currency.
This trade-off of labor for cost between developer and provider is the focus of this article, as a means of explaining how “serverless” came to be. The precise trade-off chosen by any given developer is affected by many factors, such as their budget, reliability needs, and general degree of do-it-yourself spirit. Serverless computing is a new point in this spectrum of trading labor for cost, and in order to understand how we got there, we first start at servers.
In the beginning, there were servers.
These were large, flat, physical boxes with expensive, industrial-grade computers inside. Servers were procured by buying or leasing, or building them from parts. They were generally placed inside data centers with redundant power supplies and fast Internet connections. The developer would generally have to perform (or pay someone to perform) manual tasks like physically installing the box inside a metal frame called a rack, checking for and replacing malfunctioning parts, physically pressing a power reset button when necessary, and putting out literal fires caused by overheating equipment.
Only once the server was physically placed in a server rack, powered up, and connected to the Internet could the developer start using it by installing an operating system. After the operating system came installing and configuring various software like web servers, databases, and caches. The actual application’s code would then have to be copied to each server and started, and finally, the servers would have to be monitored, operating system security patches applied, and new servers added when application usage increased or servers failed.
The time and effort involved in administering a server in this manner can far exceed the time required to develop an application, and represents an ongoing cost for as long as the application is running. The benefit of this model, however, is the absolute control over the physical hardware, similar to buying a server and connecting it to a home Internet connection. Since the developer retains complete control of the hardware and all software running on it, a $10,000 server will deliver a constant $10,000 worth of performance to the developer, unlike some of the value-added models discussed later.
Today, this model of running servers has been streamlined into commercial forms such as server colocation, bare metal hosting, and dedicated servers, each varying in the degree of manual labor required, method of hardware procurement, and pricing model. It is still best suited for those who need absolute control over their own hardware (for maximum performance, custom hardware, or legal compliance reasons, for example), and are willing and able to put in the effort required to maintain it. Generally speaking, this group comprises large organizations with dedicated I.T. staff and specialized needs.
With the advent of virtualization technology, and particularly hardware support for it baked into CPUs around 2005, a physical server could be efficiently split into multiple, smaller virtual servers (or virtual machines). Each virtual server would run its own separate copy of a standard operating system like Linux, Windows, or FreeBSD, completed isolated from any other virtual servers that may be running on the same physical machine. The expensive $10,000 server could now be split into five smaller $2,000 servers, each running a different operating system, with no modifications required to any existing software. If two such servers had been procured by an over-zealous developer, and were idle 50% of the time, they could be consolidated into a single server running two virtual machines.
Virtualization works by having an additional hardware-assisted layer of software called a hypervisor that sits below a traditional operating system. The purpose of the hypervisor is to effectively emulate a physical server, with or without the cooperation of the operating system. Virtualization had an interesting side effect: since the hypervisor acts an intermediary between a virtual machine and the actual hardware, a running virtual machine could be “frozen” at any time into a system snapshot, essentially a large file. Later, this file could be copied to a different server, and an exact clone of the virtual machine could be restored. The snapshot can be comprehensive and include everything from the running state of programs and the contents of allocated memory, to just an image of the “hard disk” that the virtual machine sees.
Commercial hosting providers took advantage of this by offering virtual private servers, or VPSs, as an alternative to running a dedicated server. By installing a standard operating system like Linux or Windows into a virtual machines, and then freezing the snapshot post-installation (and configuration), they were able to obtain master snapshots of the base state of all popular operating systems. These snapshots could then be copied to any number of physical servers, and restored to a running virtual machine in a matter of minutes or seconds. Since this process could be repeated any number of times, large fleets of virtual servers running identical software could be created and destroyed quickly, rather than having to set up physical servers individually.
Economically, the providers were also able to make better use of their server fleet. Previously, if they had wanted to offer their customers a small $500 server in addition to large $10,000 ones, they would have had to maintain an inventory of $500 servers, which are generally neither cost effective nor performant. With virtual machines, a single $10,000 server could be efficiently split into a mix of five $1,000 servers and ten $500 servers (for example), each possibly leased out to a different developer. Furthermore, the mix of virtual servers could be altered at any time, allowing not just a wider range of server sizes to offer to developers, but also more efficient utilization of their inventory of physical servers.
The use of VPSs gave developers a few new abilities:
- Scale. The server operating system only had to be configured once, and the frozen state could then be re-used to quickly create many, identical virtual servers. This meant that individual servers no longer had to be manually set up, and large fleets of identical servers could be created very quickly.
- Reliability. If the physical server failed for any reason (e.g., due to a failing hard disk), the hosting provider could automatically re-create the VPS on a different physical server and automatically switch the old IP address over to the new VPS. Users would see service disruption on the order of minutes, rather than hours for physical server repairs, and developers would no longer have to wake up in the middle of the night to fix routine hardware failures.
- Cost Control. Not only were more types of smaller (and cheaper) instances available than with physical servers, they could be mixed arbitrarily with larger instances to optimize a developer’s cost vs. performance needs. In addition, because a VPS could be created and destroyed relatively cheaply, providers generally billed for their use by the hour or month, rather than year. Large fleets could be created for a few hours, used, and then destroyed without incurring the cost of an annual server lease (for example).
Virtualization technology had a large impact on the economics of running fleets of servers. It lead to the popularity of cheap, managed installations of various software frameworks like Wordpress, as well as the now-ubiquitous $5 per month virtual server. Virtual private servers are also frequently cited in conversations in terms like “but why can’t I just run it on a $5 VPS?”, which is often a valid question, but also often has valid answers as to why not.
In an independent development, a feature called cgroups was added to the Linux kernel in 2007, offering a way to effectively replicate a limited form of virtualization within the Linux operating system itself. Contributed by Google, its primary purpose was to safely bundle a Linux program and all its dependencies into a container image, which could then be cloned and run on other Linux machines, with a degree of isolation from other containers running on the same machine. Although this sounds very similar to virtualization, there are a number of important technical differencesbetween virtual machines and containers.
One key difference between containers and virtual machines for the purpose of this discussion is the following: virtual machines run an operating system on top of a hypervisor, but containers run programs on top of an operating system.
Starting a virtual machine can involve either the entire operating system boot process, or restoring running system state from a frozen snapshot. Both of these procedures can take on the order of minutes to complete. After starting, running a virtual machine involves running an entire operating system, with all the associated CPU and memory overhead. Containers, on the other hand, run within a single host operating system, and only incur the CPU and memory overhead of the programs being run within the container. When a container is started, the application program inside it starts like a regular program, but is restricted by the operating system to run in isolation on only a slice of overall system resources.
This feature leads to two interesting properties of containers relative to virtual machines:
- Lower resource usage. Since containers do not have an entire operating system’s worth of background processes, device drivers, and other paraphernalia running along with the developer’s programs, the memory and CPU overhead of a container is far less than that of a virtual machine.
- Faster startup. Virtual machines must be either booted through a standard operating system boot procedure, or restored from a suspended state. A container, on the other hand, starts with latency that is comparable to double-clicking a program on your desktop, while offering many of the benefits of virtual machines.
Providers now had a new tradeoff available for developers: if the developer did not need to control or customize various low-level hardware facets of a virtual machine or physical server, and could express their application and all its dependencies as a file system (similar to a ZIP file) that ran on a handful of operating systems that supported containers (notably, Linux and FreeBSD, but now also Windows), the provider could run the application inside a container rather than a virtual machine. Since containers consume less memory and fewer CPU cycles than entire virtual machines, providers could utilize their fleet of physical servers even more efficiently. So instead of packing ten $10 virtual machines to a single server, they could now perhaps pack fifty $5 running containers into the same machine, due to the lower memory and CPU usage requirements of the containers.
With containers, developers had many of the benefits of virtual machines, such as installing custom system libraries and modifying many operating system settings, but with a faster deployment process than virtual machines. Since container images were generally small and easily testable, they could be quickly created by scripts and tested locally. The containers themselves would start much faster, and due to the use of layered filesystems by container runtimes, images could be built, updated, and uploaded much faster than virtual machine images.
However, one thing that had not yet changed from renting VPSs was that providers ran containers continuously and charged for the number of hours that each container ran for. This billing would occur even if the container consisted of an idle program that did no work, matching the experience of renting a VPS by the hour.
Open-source container runtimes (August 2017): Docker Engine, LXD
Open-source distributed container runtimes (August 2017): Mesos, Kubernetes
Commercial managed container runtimes (August 2017): AWS ElasticBeanstalk
Platform as a Service (PaaS)
At some point, providers may have realized that many developers were using similarly configured containers to run their applications. For example, a typical web-facing setup might contain a web server and a web application runtime and framework, like Ruby on Rails or PHP, along with the developer’s application code. These would typically be served as an HTTP-based API, allowing easy access from web frontends, native applications, and desktop applications.
Of all the components and auxiliary software that went into the “stack” of software, the only one that most developers truly cared about was their own application code, or their business logic. If the remaining components (e.g., web server, language runtime, system libraries, etc.) could be managed by the provider, then the developer would be free to focus on writing just their application code, reducing development time as well as ongoing maintenance costs. The provider would be in charge of setting up a largely standard OS environment for the developer, which the developer could count on for consistency.
In order to achieve this “business logic only” goal, the serverless methodology is to factor out all the common parts that are not pure business logic. The developer can choose to relinquish control of everything else to the provider, counting on the provider to create a standard environment in which to run the developer’s code. The provider therefore takes on all the tasks associated with maintaining and running a scalable web service, from provisioning virtual or physical servers, to configuring the operating system and server software used, to determining the exact version of a language’s runtime to use (in most cases). Since the developer no longer has to concern themselves with these tasks, the size of the code required to bring their application live is drastically reduced.
While this approach lead to the evolution of the Platform-as-a-Service, where continuously running containers could be dynamically created and destroyed by developers, there was a further cost saving to be had.
Due to the relatively small size and startup time offered by containers, providers realized that they could run the developer’s application only when an actual request was received, and then immediately shut it down to save resources. Instead of keeping a copy of the developer’s code running continuously at all times (and incurring billing charges), providers could wait for a user request to come in, and only then create a container with the developer’s code to service the request. After the developer’s code responds to the request, the container would be destroyed, freeing up system resources for other requests (and other developers’ code).
This leads to one of the defining characteristics of the serverless paradigm: short-lived, container environments that are created to service individual requests and other events. These events can be HTTP requests, WebSocket connections, work queue items, and database notifications, among many other possible triggers.
From the provider’s point of view, requests for multiple developers’ applications can be efficiently distributed across a large fleet of web servers. Since each container is short-lived, and only consumes resources for the duration of servicing a request, the provider’s fleet can be utilized even more efficiently than having continuously-running containers (which consume memory and CPU resources even when idle). Furthermore, since individual requests can be routed to any physical machine in the fleet, providers were able to offer developers the holy grail of scaling: instant, massive parallelism, without any service degradation for large, sudden spikes in traffic (say, after a Superbowl advertisement).
For developers, the serverless experience offers many other perks as well, in addition to the instant parallelism.
- Simpler code: instead of writing long-lived web serving programs, the serverless paradigm encourages small, stateless programs (such as “functions”) that can be created and destroyed on demand. Instead of keeping track of every client connected to the server, the developer’s code can now assume that it is created to communicate with exactly one client.
- No provisioning: instead of having to determine the capacity needs for an application before it is in use, developers can now deploy a serverless application without worrying about how many servers or instances to reserve.
- No paying for idle time: since serverless providers no longer run virtual machines or containers continuously, they can no longer bill for server instances by the hour. With a serverless application, developer code is only executed in response to trigger events, on an “invisible” fleet of actual servers hidden from the developer’s view. Thus, the developer has no control over how many servers or instances they want to reserve, and the only sensible billable unit becomes the total amount of time actually spent by the provider servicing all the incoming events.
A major requirement for writing serverless code, however, is to express business logic as programs that are created to serve a single trigger event, and only that event. After the program finishes handling the trigger event, it is destroyed. This is in contrast to traditional client/server programming, where a single program with multiple threads may accept many simultaneous HTTP connections and respond to them (for example), using conveniences like shared memory between requests. There is, thus, a new onus on developers to write or rewrite their applications in a different paradigm than what they were used to. In many cases, however, this rewrite can actually reduce the amount of code required for a particular application.
The serverless paradigm has applications beyond web serving, although that is currently the largest use case. There are already serverless data warehouses, serverless databases, and serverless stream processing systems. In the future, we can probably expect this paradigm to extend to other classes of products, since the benefits for developers are so glaringly obvious.
An easy way to tell if a product or service is “serverless” is often to look at the pricing page. In particular, a serverless solution will have the following properties:
- No provisioning: if a product or service bills requires the developer to reserve an explicit number of instances/servers/containers, it is likely not a serverless platform, since it requires explicit capacity provisioning. A serverless platform would add capacity as and when needed, keeping the details transparent from the user.
- Bill by usage: if a product or service bills by the time than an application is kept available for use, rather than for the actual usage of the application, then it is likely not a serverless platform. A quick way to check is to ensure that an idle application (i.e., one without any user activity), when deployed to the platform, will result in a negligible bill at the end of the month.
The following figure attempts to break down the product categories described here based on how much of the stack is managed by the developer rather than the provider, using the loose definitions of each laid out in the introduction. Each row represents a slice of the stack from the first line of application code at the top, to the physical data center that hosts the server ultimately running the application. Note that the categorization, and example commercial products, is neither comprehensive nor static. Product categories and features change over time, blurring the lines between categories.
One possible categorization of different product classes based on the labor tradeoff between developer and provider.
This has been a somewhat subjective view of the evolution of serverless computing. Future posts will look at more defining features of serverless platforms and architectures, as well as some of their caveats.
Many thanks to Oliver Bowen, Drew Firment, Merhawi Redda, Kapil Thadani, and Ishaan Joshi for comments and edits on versions of this post. Please send corrections, suggestions, and comments to email@example.com.