There is a growing trend in SysOps and DevOps communities to use linux containers for deploying applications. Comparing to the traditional way of putting things in production, it provides a lot of conveniences. Today we are using traditional Linux distributions inside the container. It works. But it has a lot of drawbacks. In this article, I try to overview the downsides of the approach and sketch a design for the new Linux distribution designed specifically for the containers.
Disclaimer: I’m the author of vagga containerization tool. And I’ve written about containerization multiple times. This is one of those rare articles which describes the vision of the thing I don’t have time to implement myself. I’m writing this mostly to find someone who can pick up the work or company who can fund someone’s research to make this reality.
There are many use cases for containers. In the article, I’m mostly concerned about running server applications. Other things like development containers or the running desktop software can also benefit from the things described. But I don’t describe that deeply for brevity.
Another point is that I’m discussing distribution inside the container itself. Host distribution might need changes too. You may want to look at CoreOS or one of few other alternatives if you are interested in the topic.
The main point is that the goals of an end-user distribution are very different from what you expect in the container. The main goal of the traditional distribution is easy user experience. Which is:
- Everything works out of the box
- Starting a service is a matter of installing a package
- Little configuration
- Easy upgrades
- Automatic handling of dependencies
- Automatic initialization of storages (like `/var/lib/mysql`)
It isn’t a coincidence that both easy to use distributions like Ubuntu obey these rules. As well as hardcore ones, like Arch or Nix. Arch doesn’t run the service on behalf of the user, but it prepares config to get it running with a single command. Nix has it reversed: you enable a service and a package is installed automatically. Everything else seems to be very similar.
But for production applications, the requirements are vastly different:
- Freeze versions of everything (including dependencies)
- Do not run anything unexpected
- All configuration files are in revision control
- Even environment variables are carved in stone
- Never touch storage dirs like `/var/lib/mysql`
Sounds familiar, right?
The Perfect Package Manager
Here is a quick round-up of features, just so you get some feeling.
- Reproducible builds from the day one
- No configuration and user management at all
- Plain tarball packages, no install scripts
- Supports Filesystem Hierarchy Standard (FHS)
- Full support of Python, Ruby, Node.js, other languages
- Simple package format (probably even simpler than in Arch or Alpine)
- Out of tree package manager (and bookkeeping)
- Own package and image hosting
The obvious things (like removing kernel, docs) are omitted. Before discussing gory details, I’m going to say, that all of the above should be supported vigorously because we don’t need another distribution that only works partially. And remember, we are building containers, if some feature is not in our new distribution yet, we can spawn a Debian container just for this single service.
Let’s walk through each feature and see why it is so beneficial.
By reproducible builds, I mean that a package can be built by several independent parties and produce exactly the same result (byte comparison yields no differences).
This feature is the most important. All distributions are trying to do reproducible builds nowadays, so it’s ugly not to do it for new distribution. But there is a more important reason.
Reproducible builds allow us to do secure user-contributed binary packages.
Every Linux distribution needs some sort of user-contributed repository. Usually, you need to trust the user who submits a package that the package is okay (i.e. it works and doesn’t contain malware). The Arch Linux User Repository makes it easy to inspect the package contents but doesn’t contain binary packages. The Ubuntu Personal Package Archives has binary packages that are built by trusted build cluster, but the debian packaging doesn’t make it easy to inspect the package sources. Another issue is that sometimes package contributors give up on a particular package.
Imagine you have a simple package format, that anyone can build and submit a binary. At first, you don’t need to trust anyone. Just if you have built some package on a desktop, the software automatically verified that the central repository has exactly same contents. Then you may install it on a laptop or on the server, or recommend it to the friends.
But if some big SaaS that you trust signs a package, you can be confident that package is good enough. For example, continuous integration service might publish its own packages so that users may have a chance to reproduce bugs easier. Or another provider might sign the list of packages that are distributed as part of its enterprise solution (for in-house installation).
When it’s cheap to build and verify a package, I believe, everyone will do it. Especially it’s useful combined with a service that distributes full images. More about that below.
Also, note that for server usage we need frozen versions of everything. Which basically means that many users will need python (as an example) 3.3, 3.4 and 3.5 in a distribution (where most current distributions, with known exception of Nix, provide only single officially supported package). This is where crowdsourcing will help a lot.
No Configuration and User Management
Well, for servers and services we don’t want our configuration to be managed by package manager. We have the following tools instead:
1. Configuration management tools (ansible, chef, puppet, …)
2. Version control systems
3. Configuration generators (confd, consul-template, …)
The traditional techniques for dealing with user accounts is also useless for containers. We don’t need shell profile and other stuff in the home directory. Usually, containers are made read-only using mount options, so filesystem permissions are also underused.
The only important thing about permissions is if a user is root or not. So usually there are only two users in a container: `0` and `1`. And since user ids are just numbers we don’t even need a “/etc/passwd” for most containers.
Plain Tarball Packages
By my observations in Arch and Alpine Linux distributions, there are much fewer install scripts comparing to Debian. If I remember correctly, it’s impossible to make an installation script in Nix. So I think we can afford similar here.
Some installation scripts may be replaced by custom packages: i.e. you may create a package that contains custom “ca-bundle.crt” instead of using installation script for it.
Some items are just useless for the use case: you never want to initialize “/var/lib/mysql” at package installation.
And few installation scripts are just things that will be run on configuration phase, after image is composed and mostly done.
It’s important to keep packages as a plain set of files. So that packages can be composed on top of each other like docker layers. Except every layer must be distinct, no file conflicts and deletions allowed.
If we don’t allow conflicting files, the image is just a unordered set of packages (but taking dependencies in account). I think this should make tools that build and test images much less work (because they don’t need to build combinations of packages).
And yes I believe that it’s possible to organize packages in a way that there are no conflicts (except obvious ones, like you have python3.5.0 and python3.5.1 which are expected to conflict). And finally, in case you can’t achieve something with unordered layers of packages. You can run any command as the last step of the build process (which is usually used for rendering configuration). Even if you composed a complex image that requires multiple installation scripts to be run, by several independent packages, it’s okay if you are required to put together a list of scripts manually.
Filesystem Hierarchy Standard
This is just to mention that I like FHS a little bit more that what Nix does. While I’m using Nix for my desktop needs, I believe it’s a little bit hard to reason about for server needs. Also, we have everything containerized and as little as possible in a single container, so we don’t need (and explicitly don’t want) multiple versions of single package in a container.
Full Support of Different Languages
In the real world, every programming language has its own package manager. Still many languages need some native dependencies which are not tracked by the respective package manager at all. But this is essential to get reproducible package builds.
Usually, it’s possible to repack pythonic (as an example) packages to another format. Also, Nix seems to do great work of repackaging all (or most of) the packages on Python Package Index. We should learn from them.
And having reproducible, signed and cached builds of large Python packages (like numpy or scipy) is very nice to have.
Simple Package Format
The possibility to create a new package easily is hard to overestimate. The Arch Linux has much simpler package format PKGBUILD, which is basically a single bash file. As a result, it has more user-contributed packages than a much more popular Ubuntu.
Since we want to get rid of installation scripts we might have even simpler package files. Unlike in Arch where package file is just a bash with a couple of variables and functions, we must have a format where the metadata can be easily parsed without evaluating untrusted code.
Also simple package format is very important so that anybody can just fix the version and rebuild the package. Instead of the current common practice of building from sources and skipping a package manager just for minor version upgrade.
On the other hand, reproducible builds might need more boilerplate for most of the packages. This requires some experiments to get right.
Out of Tree Package Manager
I mean we don’t need a package manager inside the final image. As well as any index files, package caches and so on. Nowadays we usually clean package caches after installation and ignore package manager.
Vagga has few special commands like `Git`, `Tar`, and similar which checkout a repo or unpack an image without installing git or tar into the container itself. This may be extended to installing the packages.
At first sight, this is merely an optimization. But remember, we want to treat an image as an (unordered) set of packages. If we put package manager’s indexes into an image and later remove, previous assumption is false. Even more silly the proposition is when indexes are updated during installation of the packages.
Another good point is that if you don’t need installation scripts inside the container you probably don’t need bash and other command-line utilities too. You may put the Go or Python program, and that’s it. So even if someone break the system he can’t run a shell.
Own Package and Image Hosting
Sure, every Linux distribution has package hosting. But we have a couple of special needs.
First, for containers we need hosting of images. And we don’t want images to be large uncontrolled binary blobs (like in docker hub). We want images to be a set of packages, each of which is signed separately. And the image is a signed as a whole.
This not only serves as a convenience. Here are couple of useful features:
- Tracking compatibility between packages easier. If they are used in many images, then they are probably compatible
- Searching for vulnerable images quickly. (there is some evidence that Over 30% of Official Images in Docker Hub Contain High Priority Security Vulnerabilities, but there is no good way to find and deprecate all of them)
- Saving the disk space. You don’t need to store large binary images. The image is just a set of package identifiers.
- Saving download bandwidth. Some packages might be downloaded from a local cache.
The overview above doesn’t pretend to be comprehensive. But hopefully, it was inspiring. There are few more questions I’d like to cover.
Why not Nix?
The problem with Nix is not only non-standard system layout. The main issue is that it’s hard to lock specific version of a package.
For unprepared reader I should describe how nix works a little bit. Nix builds a package from a git repository which contains a configuration of all the packages. To be able to install packages quickly, nix has a package cache which builds binaries of the current version of all packages.
So to freeze a package version you need to lock the revision of git repository. And then you quickly run into the situation where the package cache you relied on is already replaced. So you need to build everything from scratch (including gcc, glibc, and so on). Which effectively means you can’t freeze a package without running your own build bot (called “hydra”) and a cache. Which is okay for larger organizations, but is too cumbersome for individual developers.
How This Compares to Docker and Runc
Well, as I’ve superficially shown in the text above that functionality Docker has now is much less functional than what I propose. Anyway, you will be able to use the packages much simpler than what you use now with Ubuntu or another distribution. Probably you could use the package with “ADD” instruction in Dockerfile (even if it would require transforming the original tarball by package registry).
Also, the package described here is very similar to what described in Runc spec. Hopefully, you could use packages as file system layers directly.
As I’ve shown our requirements for the package management in containers are very different to what we have in current Linux distributions. While it’s totally okay to use Ubuntu or Red Hat inside the containers, we have a way to make it better.
While the initial investments to this kind of system are large, the impact on the ecosystem is also huge. Creating well-designed OS distribution specifically for containers would make us use smaller, simpler and more secure containers. It should take the crowdsourcing of packaging to the whole new level by making it easy to contribute and keeping it much more secure that what we have now.