This article was originally written in the beginning of 2014, originating as a Keynote I gave at USENIX. I made some small edits toward the end of that year. It was never finished and there remain placeholders for graphics. Trends highlighted here have happened in some ways, such as the invention and increased popularity of GraphQL. Serverless technology is changing how we manage and use immutable infrastructure. Many lessons are yet to be learned, and I may yet publish a revised article! — Erica Windisch (July 2017)
Today I come to you to speak of a future. A future of Change. Not just of Change Management, but of the world. Of the internet. Of our industry. I talk about changing not just the future, but managing the changes of things come.
TL;DR? This article is the long-form version of the following deck:
Past & Present
Today, you are probably Managing servers, VMs, and perhaps containers. You might also manage network infrastructure.
But whether you are managing physical machines, virtual ones, or both… if you are here today, you are probably using configuration management, or planning to use configuration management in your organization.
Today, primarily, these solutions look like:
<< Graph >>
- A configuration management service which provides a configuration matrix:
- Of your recipes and manifests, etc.
- Servers are booted,
- Connect to a network,
- They subscribe to the management service.
- Attempt to reconcile the system’s current state with the canonical state it “should” be.
- And these systems place an emphasis on idempotency.
If you are really savvy, machines boot into a set of preconfigured images. You might even use Chef-Zero, Chef-solo, or Masterless Puppet. Yet, they all largely follow the same paradigm.
One of my problems with these tools is that they are overly complex. The fact that idempotency is a concern and a design tenant only highlights the legacy that is encapsulated into these products. It isn’t that idempotency is bad, necessarily, it is that it presumes we might EVER run something twice. In fact, a design tenant should be that we never allow running configuration management twice. That our provisioning is WORM — write once, read many. We use configuration management to build an artifact and we use that artifact to boot our images. That is, one way of expressing immutable infrastructure.
Are we managing configurations, or are we managing systems, and is there a difference? Systems configuration and management are indistinguishable.
The easiest way to enable WORM-based provisioning today is to use containers.
You’ve probably heard by now, but Containers are coming. It’s happening. It’s been happening for a long time. However, whereas docker is new, Containers are not. They have been around for well over a decade now.
But, what are containers? They are just a security construct. They are not a mythical box. They are not even virtualization which employs interesting hacks around CPU interrupts and, more recently, processor extensions. No, they are just processes on Linux with a sprinkling of security.
While the security version of the story sounds like virtualization, it isn’t. Neither is it virtualization, or is it really LIKE virtualization. Instead, it’s capability not unlike SeLinux and Apparmor.
This is why I find questions about the security of Docker and containers so interesting — because it is not a removal of any layer of the system, it is the addition of one. It is the addition of constraints on processes that were not there before.
However, what is disruptive is the newfound portability and transportability of those containers. What we have done is not to containerize processes on a host, but to enable self-contained application images to become portable between systems.
Portability and transportability are key values
that Docker Engine introduces to containers.
Of course, containers are NOT a replacement for configuration management. The fact that they simplify and enable WORM-based provisioning reduces or, in some cases, eliminates the need for other configuration management tools in the provisioning chain. However, there remains a need to manage change.
It’s not just about securing your processes in a container, although that is part of it, it is about portability. But it is not just about portability of the image, it is about the portability of the services that run within them.
Containers are a single part of a wider paradigm shift away from servers. When we view our applications as running on a platform, rather than running on servers, it’s not hard to see the writing on the wall:
The operating system is dead
Now, this might appear overly dramatic. However, look at the new containers-centric operating systems such as boot2docker, CoreOS, and Project Atomic — RedHat’s new operating system. We can bring up a kernel running Docker, and Docker runs self-contained applications that are built FROM your traditional distributions, but are not actually running those distributions. Instead, they build an image based on a distribution and run a single process from it. In the container micro-services model, the application runs in a container and only uses shared libraries, possibly uses its package for the application service itself, and less commonly utilizes the distribution’s binaries.
Two years ago, I said to a colleague: The kernel is dying.
Now I fear the distribution is dying.
We are following a trend of thinning the fat. Two years ago, when I spoke of the kernel dying, I was referring to Linux in the sense that it was time for the kernel to Fast and tighten its belt. That with virtualization supporting a homogeny of hardware, our kernel would become thinner. And in a sense, this might still prove true…
But, what is currently happening, is that Linux as a kernel is, in way, becoming resurgent. Containers mean that the kernel is relevant again. The kernel is not just a place to manage hardware drivers, but a layer of protection for your applications. Without the kernel, applications would be protected by nothing but their hardware or VM — and would be absent the controls offered by SeLinux, AppArmour, and cGroups. Interpreted and bytecode languages such as Python, Ruby, and Java suddenly have advantages to running on Linux — whereas a couple years ago, the greatest friction to leaving Linux was the lack of alternatives and poor support for BSD on various clouds, we’re increasingly entering or reentering an age where the friction of leaving Linux will be based upon the capabilities of the kernel and the platform itself.
Containers technology finally leverages the things the kernel does well.
With changes in how people use Linux, with virtualization and containers, Linux will to change to these paradigms. While there will always be someone wanting their unique driver in the kernel, the number of drivers people will use in production and those that ship with Linux are likely to decrease. This will happen as more users only use Linux in virtualized environments, or with containers.
Honestly, my view of containers is probably a bit forward-leaning than most. I believe the pendulum will swing back toward virtualization again as we see containers and traditional virtualization as we saw with para-virtualization and hardware virtualization. That the pendulum may even swing a few times before we settle on what I’m calling Hybrid-Containerization. With Hybrid Containerization, various syscalls will have “constrained” versions that may operate with more constraints. We might even utilize techniques of virtualization.
There is, of course, the matter of orchestrated per-tenant VM clusters which run clouds of containers — such as Google does.
Regardless, the point is that micro-services means compartmentalization, and with containers means small distributions and a smaller attack surface per-service (as to whether or not it reduces the total attack surface is a matter for another debate)… but it certainly means smaller pieces.
It makes sense, really. Because micro-services are a misnomer. They are commoditized processes, just as virtual machines have commoditized hosts.
The great thing about commodities is that they become Just Another Thing.
Its close alignment to the Internet of Things is
the disruptive element of the micro-services movement.
You’ve probably heard by now, The Internet of Things is upon us. It is happening, but it has been happening for a only a short time now. It is happening, but the builders of this new world come in many shapes and sizes. Embedded software is notoriously bad, it is buggy, hard to manage, support, and update.
But… Suddenly, we want to connect these devices. We’ve already been connecting these devices in the form of PDUs, serial consoles, switches, and routers. And we have done a terrible job at it. Certainly, there are variants of these devices that are great, but it is the exception. Yes, I include Cisco and Juniper in the “exceptions”. Other vendors? Especially the low-cost options? Anyone that has ever used a “web-managed” switch will tell you that they are not very good.
Those problems often come from the costs and constraints of the hardware, limited expertise in writing such software, a frequent necessity to write larger portions of low-level stack which most higher-level systems folks on servers haven’t had to write since the 70’s and 80’s, since BSD sockets became a thing, or in the 90’s at the latest — since WinSock became a thing.
It’s no surprise that embedded network programming is usually bad, if most developers are working, not only with 80’s hardware, but with developer libraries of a similar vintage — as far as networking is concerned.
Undoubtedly, the best solutions prove to be the ones built on top of more capable hardware that can leverage modern software libraries — and most of that runs Linux, either via a small customized bootstrap userspace, or with a distribution such as Android.
Furthermore, even for those devices that ARE good integrate poorly with our configuration management tools. If we cannot manage the devices we already know and use, how will we manage the devices to come?
First, Linux will grow around supporting these new platforms.
For years, we’ve been hearing about the Cloud, about Cloud Computing, as well as “cloud this and that”.
What is the Cloud? It is not centralization, but rather automation of device management. Unfortunately, the term has been overloaded to expressing any Service, rather than the unique automation of physical and virtual devices.
If we accept that the term has changed — and I think most now accept this, that the Cloud basically means: It’s on the internet.
However, what we don’t have is the hyper-connected, federated cloud of our devices. This is the new internet paradigm. The one of connected services and automation.
This is what we must build — and what it seems what is to come — is a new cloud. That cloud will be…
So MQTT can’t change things well, but…
Should Things Change?
State changes. We like to pretend it doesn’t. The concept of immutable infrastructure is terribly broken at worst, and a misnomer at best. It’s true that Docker images, for instance, are WORM media. However, the containers themselves operate on a Read/Write copy. However, even if the disk were Read-only, the memory would still be mutable. That is to say, there would still be state.
There is no such thing as stateless.
Immutable infrastructure doesn’t kill the Chaos Monkey.
The reality is that while little in the universe is static, we like to treat things as constants. We want immutable infrastructure. To create images and disregard their changes.
However, we can not prevent state from changing.
We cannot kill the Chaos Monkey.
Often infrastructure fails and we do not want to tear it down, we not only wish to preserve it: but we must. For years, I’ve been building systems with disposable units of compute. However, it is naive to think we can simply throw away VMs or containers — we want to preserve their state for archival and analysis. In fact, this is one of the properties of VMs which make them yet better than containers — the ability to easily and affordably snapshot memory as an artifact.
Not to say we cannot do that with containers, but memory snapshots and suspend to disk are more mature with virtualization technologies than they are for Linux processes.
If anyone is still questioning the value of memory and disk artifacts from your “stateless” containers:
<<Security incident management>>
Think about a system where your response to a security incident is to shutdown your hosts, VMs, or containers? With a fully and truly immutable infrastructure where we presume there is no local state ever worth saving, you’d have discarded your application memory and possibly even your disk contents that will contain invaluable information your incident investigation. Thus, state is an essential part of your security story and absolutely stateless infrastructure cannot be part of any serious security story.
The implicit state of a system should not be deemed disposable.
I should note that one need not abandon “immutable infrastructure” or 12-factor, only that they are models that may not be complete and may need to be used as a basis for your infrastructure strategy, not -AS- your infrastructure strategy.
Things will Change
Because everything has State, everything will change. That means your servers, your VMs, containers, and coffee machines. In fact, my espresso machine has a PID, meaning it has a temperature probe connected to a feedback loop that controls a relay to regulate temperature. The machine is constantly changing temperature, something it measures on an interval. That, of course, is State.
The Internet of Things is more than just about connecting and controlling Things. It’s about sensors and the artifacts of reading those sensors. Those sensors are detecting external change and influence. They are measuring input. That is the state of those things, of those sensors. It changes, will always change, and is in fact, the whole purpose for its existence.
We measure change and respond with effect to affect the state transition we desire.
Sometimes the State that changes has nothing to do with your Thing,
but is an artifact of knowledge.
We cannot control change, we can only effect it and manage it.
Micro-services will Change
Like other Things, Micro-services will expose an API and in that API, change will happen. The state of that service will change. An API request changes a field in a database. A user logs in and a token is created, that is state change. A log file is written… etc
This is different than the implicit change, yes? It is the explicit change that happens in a stateful service.
What’s amazing is how well Micro-services map to physical Things. One takes input from sensors, and the other takes input from an API… each time, of course, we have change.
APIs and sensors, both, are inputs.
Really, the biggest difference between micro-services and physical things is that those services and those containers — or VMs — is that they are entirely virtual and may run on a larger physical thing. Just as a physical thing might have multiple sensors…
The Chaos Monkey Lives
Things will change.
Because Containers are Things, they too will change. Bitrot happens. Disks fail to make writes. Memory becomes corrupted. Out-of-memory killers run. Services are attacked and compromised.
We need to manage containers like we manage Things. Solutions that do not do this are not good long-term solution for managing containers.
What I speak of has theorems and proofs to back it up.
Applying the CAP theorem, or Brewer’s Theorem as it is otherwise known, we can state this as the inability of a distributed system to provide the high availability and partition tolerance implied by immutable infrastructure and 12-factor design while also providing the consistency necessary for reliable state management in a fully disposable manner.
Purist immutable infrastructure and 12-factor design dictate stateless design. As we know, true stateless doesn’t exist. Now, the CAP theorem basically means that we cannot simply rely on logging, consensus algorithms, and other distributed computing solutions to kill the Chaos Monkey.
What it means is that we must accept we might lose state. In fact, what it means is that we’re guaranteed to lose state. Important state such as who stole credit card numbers from your database, or took off with a copy of your private key.
We must accept this.
Because we cannot kill the Chaos Monkey, we need to know how to collect its droppings. 12-factor and immutable infrastructure fail to acknowledge this.
You need to Manage Change, not prevent it.
Uh, Chef? Puppet? Salt? Ansible?
Lets not fool ourselves. These tools are designed for the old paradigm. Not to say they are not useful. However, the companies that have designed and built these tools have done so with the perspective of providing or supporting centralized infrastructure for managing provisioning.
They are not designed for speed. They are not designed for micro-services. They are not designed for Things. They are designed for creating change, not for capturing it.
Some of the companies, or at least some of the engineers at those companies, have recognized the disruption. I’ve seen amazing effort from Chef and Saltstack, in particular, toward adapting either their thoughts, if not their processes, to the disruption brought by the Internet of Things and containers.
However, to be honest, I don’t believe those companies necessarily need to evolve. They do something well and it might be good enough to leave well enough alone. They might become more niche, but they have a solid market base.
My fear is less that they’ll fail to evolve,
but that they’ll lose their identities in the process.
In fact, some of these tools are great at tasks such as scheduling and lifecycle management, even if we disregard their value for managing configuration.
Furthermore, and I’ll use Chef as an example as I know it best of all these solutions — Chef defines resources, which are simply objects, and work well as abstract units that map cleanly to the thought of managing Things.
Technically, for instance, one could today use Chef as a cloud orchestration solution —in fact, I understand this was the initial scheduler solution for the Deis project.
Regardless, this talk is less about the state of current tools and more about the tools we need tomorrow and the fact that the tools we have today don’t yet solve tomorrow’s problems.
I expect new competition to flourish around greenfield solutions for solving tomorrow’s configuration management problems.
The old guard configured the interface of Linux filesystems and processes, but the new guard will configure the interface of APIs.
New solutions will be to configure Things through their own APIs, rather than through custom agents.
The best thing is that this will work on all Things, all devices, and across Operating Systems — even Windows.
Unfortunately, our current tools don’t solve these problems. We can no longer abide configuration management tools that use specialized agents.
Changing those Things
To change the things of tomorrow, we’ll use their APIs.
OpenStack Heat is an interesting example of something that Changes Things today. It defines Resources, each which may be considered as a Thing. Each is managed independently and orchestrated together using a Heat Orchestration Template (HOT).
First, it is a good example of how compartmentalization provides flexibility, but it also highlights the complexity of managing many pieces. To solve this, Heat has what it calls Provider Templates. These allow creating what are effectively recipes which aggregate management of multiple things into a single resource — or Thing.
The worst thing I have to say about Heat is that OpenStack as a whole tends to be overly insular, making the adhoc use of components such as Heat, an uncommon exercise. Still, Heat supports standalone installation.
Last week, I spoke with Jesse Robbins, founder and former CEO of OpsCode — now Chef, about the Chef Server implementation. Now, I haven’t confirmed it, but my understanding from our conversation was that Chef explicitly forms a graph relationship with its nodes…
In some ways, Chef might already offer many of the right things for the next generation, if only at a relatively local, non-global scale.
Beyond that, Chef also changes Things. It defines Ruby objects that map to Resources, where each resource is — effectively, a Thing. Now, the hierarchical model is different than we might like for the Internet of Things context, but the actual recipe pattern would be surprisingly aligned should that it be mapped not to local resources, but distributed ones.
The future of management services
Yet, the next generation of web technologies will make aspects of Chef server, Puppet Server, Ansible and others obsolete.
Just as Hypertext provided an implicit graph, linking and building relationships between websites — our next generation of web technologies will offer an explicit graph to provide discovery and inventory.
If we do it right, discovery and inventory of services — of things — will be a built-in feature of that next generation technology. It will be here by default, rather than the exception to the rule.
Application-specific discovery and inventory mechanisms won’t go away,
no more than Gopher has.
Remember Gopher? It was pretty cool… until hypertext killed it.
Granted, it will only have an eventually consistent view of the universe, but that’s already true of all configuration management systems today. It’s true of the World Wide Web, and according to Brewer, true of all distributed, highly-available, and fault-tolerant systems — any such system that we can build with global reach.
Now you have two (trillion) problems
In the world where everything is connected, nothing is connected. When we saw computers coming, we built a hyper-connected world-wide-web. For those that remember it, hyper is a word that now feels so quaintly 90’s, or worse, archaic. However, it’s probably the right word for what we need today:
We cannot afford to build networks of things that look like Service-oriented-architectures. There are aspects of SoA that make sense in a hyper-connected world. Certainly, the World-Wide-Web terminates at webpages which are no longer static, but backed by N-tier applications.
I’m not saying we have no need for architecture in the World of Things, but that MQTT is not enough.
The MQ Telemetry Transport is a lightweight pub/sub protocol. I have nothing bad to say about MQTT, except to say that it’s incomplete. That’s probably a good thing.
MQTT does some things great. It provides a way of publishing sensor data to many clients. It provides a buffer between slow devices and fast clients. In most deployment models, it breaks out of NAT since we, as an industry, have failed to adopt IPv6.
What it does not do is provide command, control, or configuration. It’s web-pull, not web-control. Change and control mechanisms can be built on MQTT, but it’s just not good at it.
MQTT solves important problems related to accessing sensor data,
but not all the problems of accessing and controlling things
If MQTT is our HTML, where is our HTTP? MQTT doesn’t provide a sufficient analogue to REST
Take a REST
The REST pattern has been important to building modern web APIs. However, those APIs are failing us.
REST implies that our services are resource-centric and we use a proper set of verbs for information retrieval and performing changes.
But unfortunately, the primary issue with REST is that it’s not a protocol. It is at best a guideline. That’s not strict enough for building a hyper-connected web.
Solutions such as WADL, Swagger, and API Blueprint seek to solve the deficiencies of REST. Yet, it’s not yet clear if they’re good enough and adoption remains key.
Finally, REST also implies idempotency.
Idempotency as a requirement for REST is a challenge for many services of the Internet of Things. Yet, we cannot necessarily presume machines can process requests fast enough to repeat them, nor do they have the memory to provide sufficient state modeling.
Protocols such as MQTT provide value as a buffer to support idempotency for REST access to Things, but this works better for retrieving data, rather than creating or updating it.
No, we need hyper-connected APIs because…
Things are Services (& services are Things)
My favorite Cloud is Uber. First, it makes Cloud really easy to explain to layman, but it’s also just really cool. Uber has a cloud of vehicles. Each vehicle provides a service, managed by a driver. Users of the Uber cloud request resources and Uber schedules a resource to the user, provided by a service, delivered by a driver.
Long term, will services such as Uber and Lyft be powered by autonomous vehicles? Their cloud uses Things to orchestrate people, to provide a service. Tomorrow, it’s likely they’ll be orchestrating things from end-to-end.
To build these next generation services we need first class events. MQTT does this well. We also need first-class command and control. That’s what HTTP already does well. Sure, unlike MQTT, it’s heavy… but change is a heavy process. It’s the sort of thing we only need to do on capable devices, not on your smallest microcontrollers. But there are other protocols as well, such as COAP — a lightweight protocol that translates to HTTP.
My intention here isn’t to mandate which protocols we use,
but to raise awareness for the need of such protocols and their adoption.
Protocols such as WebSockets and SPDY / HTTP2 provide multiplexing of multiple channels over a singular TCP connection. In a sense, we might expect these to be a sort of new document-type, or proto-type, if you prefer.
MQTT itself is a protocol, not a document-type… but if it doesn’t do all the things we need, we either need a different protocol, or multiple.
We can either run multiple protocols over different ports, which would be horribly complex — not to mention painful, or we can decide on a single wrapper protocol.
This will allow us to provide native HTTP, COAP, or other protocols alongside others.
I’m sure, of course, everyone has seen this…
Every time you create a new standard to solve a problem, you have two problems
I can’t say what the standards will be. MQTT might not the right solution at all. Better REST might not be a solution. WebSockets, SPDY? I don’t know.
What I do know is that we have problems that are not addressed by the current protocols and current solutions.
Upgrade The Internet
To accomplish the goals we are seeking, we need to Upgrade The Internet. That might read a bit bombastic…
But we want to do more. We want devices to connect. We want those devices to know not just who to speak to, but how to speak to others.
/Tim Burners Lee/ — “Google could be superseded by the Semantic Web”
Tim Burners Lee coined the term ‘semantic web’ which he defines as “a web of data that can be processed directly and indirectly by machines”.
The time of the Semantic web has come. In fact, we need more than a simple semantic web: We need semantic apis as well.
Recognizing that what we want is a Semantic Web gives us context for researching and understanding the efforts that have come before.
Note that I just said, “a” semantic web. Lets not commit to any protocols or document-types just yet!
Sometimes called the Web 3.0, the Semantic web seems to be the thing always on the horizon — we’re always on the cusp of it, but we can never achieve it. Companies live and die by its dream.
Initial efforts to build a semantic web created the Resource Descripition Framework — RDF, the Web Ontology Language — OWL, and of course was coupled highly to XML.
In an industry that has gained distain for XML in favor of more manageable formats such as JSON and YAML, to the degree that even the W3C effectively redacted on XHTML with the publishing of HTML 5, added to the fact these protocols have failed to gain parlance in the common vernacular of web developers… you can imagine that
The RDF and OWL efforts have been near complete failures.
Of course, there are also tons of articles on how the Semantic web has failed, why you shouldn’t use it — in its current form, etc.
That isn’t to say the concept and model isn’t right. It was simply too early. The web wasn’t ready. We needed web 2.0 before building web 3.0.
The Semantic Graph of Things
The Semantic Web should not just express the context of things, but provide discoverability.
In fact, it’s been argued that DNS already solves these problems.
Yet, it’s flawed. Critically flawed.
DNS is hierarchical, which is okay, but queries are also hierarchical, with queries for record types (I.e. relationships). It’s designed for a small number of relationships per node with responses returned via a single packet. Yes, one single packet. The EDNS RFC with TCP packets allows arbitrarily long DNS responses, as long as they can fit inside a packet, anyway.
Of course, large UDP packets are not likely to get to their destinations and large TCP packets will fragment. Beyond that, an amazingly few number of engineers seem to remember that TCP packets are limited to 64 kilobytes. That’s 43 fragments for each packet that has to be successfully routed.
Yet maybe 64K ought to be big enough for anybody. Maybe not. Even if we agreed to work within these constraints, 64 kilobytes might just be enough — as long as we are comfortable making trade-offs and hacks.
But do we really want to make those hacks? I say not.
Instead, we probably need new, smarter protocols.
Building a new Web
There are a number of individuals and vendors alike attempting to find solutions to bridge the gap between today’s protocols and tooling and tomorrow’s world. Most are seeking to make it easier to build, support , and manage applications. Few of them appear to be seeking to building a global, hyper-connected web.
One of the few efforts is next week’s W3C Workshop on the Web of Things.
Others are now working on a project called libswarm, a project which originated at Docker. Libswarm and swarmd seeks to solve the problems of providing service connectivity. It has a design such that it can be backed by services such as MQTT. Where you might have used RabbitMQ before, you would now use MQTT, and where you once used HTTP, you’ll use Swarm. That is, Swarm isn’t the solution, it’s a lego-piece that helps build a wider solution, just as TCP and HTTP are lego pieces of the Web today.
Swarm builds on libchan, which builds on top of existing
protocols such as SPDY and HTTP2, or WebSockets.
Swarm is not semantic, not yet, anyway. However, it can provide an interface to semantic APIs. It’s HTTP to your Hypertext. The problem is that while MQTT is semantic, it isn’t explicitly hyper-connected.
Of course, we — the community — are working on it.
Again, this isn’t to say Swarm must be the future. It’s too early to pick winners, and it’s not about winning the battles over protocols — it’s about winning together.
The key point here is collaboration and agreement on goals.
So far, most or all of the other projects and efforts appear to be silos. They are not hyper-connected, they are not semantic. They are just brokers for connecting devices with no apparent aspirations beyond those humble beginnings.
Regardless, it’s not a competition. The point isn’t to make any single solution win, but to find A single canonical solution, just as we did with TCP, HTTP, and HTML.
At worst, it might be disruptive to companies building products around silo-based solutions, but it’s not competitive — it’s evolutionary.
The New Internet
Needs to be about all Things, not all Devices. Where all things are services and all services are Things. Many of those services will be hosted by the infrastructure and servers we know and love today.
We need new services and configuration management solutions that work across devices, clouds, and containers.
And we need to bet on free, open, and semantic hyper-connected protocols.
Today, we are changing our servers, but tomorrow — We want to change the world.