Federated Clusters with Docker Swarm

TL;DR Federated clustering overview with a focus on Swarm. Includes architecture diagrams and tools for building an experiment in AWS. Swarm’s API is a great building block that helps you create much more sophisticated deployment architectures or scale/diversify underlying infrastructure. Whale-Mullet is a Swarm fork I built to make the whole thing work.

Docker Swarm provides an abstraction that allows a user to treat a cluster like a single node. That is the case as long as the Swarm API is mostly compatible with the Docker API. This begs the question, “If I can treat a Swarm like a single machine can I create a Swarm of Swarm clusters?” This is called cluster federation. This article describes what how I tried to build a federated Swarm cluster, what problems I encountered, how I made it work, and limitations of the system.

Is this meme still a thing? 2007 was a long time ago.

Other than random curiosity there are a few reasons why you might build a federated cluster. You might want to create a multi-cloud deployment topology, isolate stages of your deployment pipeline while maintaining control through a centralized hub, or for PaaS multi-tenancy purposes. You might also want to build a cluster that is just bigger than anything any of these clustering technologies support. Until Kubernetes 1.1.2 the largest supported cluster size was 250 nodes, and 1.1.2 only increases that to 1000 nodes. In 1000 nodes is massive for a “large-scale server software” perspective, but that is a fairly narrow view of the world. 1000 nodes is nothing if we are talking about networked consumer products or IoT (sensor networks, vending machines, ATMs, thermostats, cars, etc).

If you’re new to Docker or Swarm please consider checking out my book on the topic. The final chapter covers working containers on Swarm clusters in detail. Your support makes this content possible.

Callouts, Considerations, and Ubernetes

The configuration described below does not create networks that span the set of all local clusters. While each local cluster might implement some form of multi-host networking the logical federated cluster does not.

It is worth mentioning that people have been doing this with Kubernetes already (at least experimenting with it). This article will not go into the fantastic detail as the Ubernetes proposal, but will provide a great starting place.

What is in a Swarm Cluster?

A Swarm based cluster is made up of three main components: a Swarm manager node (or nodes), a set of Docker nodes, and a node discover mechanism like a key-value store. There are a few popular and open source key-value store options. In this article I’ve used etcd, but could have just as easily used Consul, or ZooKeeper. The cluster works because all components can talk to each other via well known interface specifications. Docker nodes implement the Docker API. All nodes have an etcd driver that can register endpoints with etcd. The Swarm manager exposes the Swarm API — which is mostly compatible with the Docker API — to clients. Docker nodes join the cluster either using libnetwork at daemon startup, or using a Swarm container in join mode.

Building a Swarm of Swarms

Building a federated cluster typically requires some kind of specialization at higher levels. You might expect to find a special federation scheduler or higher-level deployment abstractions. Building a federated Swarm cluster relies heavily on delegation of responsibilities and hinges on treating each local cluster like a single logical node.

Federation should be simple if we can treat local clusters like logical nodes.

Notice in the illustration above that all of the component of a single cluster are present. There is a single key-value store/cluster, a single (or high-availability) manager node, and two logical Docker nodes that have registered with the key-value store. This illustration also includes a client node that should be able to interact with the federated cluster just like a standard Swarm cluster or single Docker node.

The illustration below reveals the detail behind the logical node abstraction and shows exactly what needs to be built.

An experimental federated Swarm cluster in AWS.

Each local cluster is a standard libnetwork or Swarm cluster. The only difference is running an additional “swarm join” container on the manager node that uses the federated cluster key-value store and advertises the Swarm manager port instead of the Docker engine port.

This proof of concept is configured for rapid iteration, open inspection, and easy experimentation. It does not include appropriate security group configuration, network isolation, load balancer integration, or special roles for access to other resources.

Trying it out

I launched a federated cluster in AWS using a CloudFormation template. After accessing the terminal on my bastion host I used the docker info subcommand to check the cluster health of each local cluster and the federated cluster:

# Check local cluster A status
docker -H tcp://10.0.1.10:3376 info
# Check local cluster B status
docker -H tcp://10.0.2.10:3376 info
# Check federated cluster status
docker -H tcp://10.0.4.10:3376 info

I found that the local clusters had come up healthy. The federated cluster had come up and the local cluster managers had joined as nodes but they were stuck in a pending state. I knew that I had come across one of a few minor differences between the Docker Remote API and the Swarm API. This does not work out of the box.

Nodes: 2
(unknown): 10.0.1.10:3376
└ Status: Pending
└ Containers: 0
└ Reserved CPUs: 0 / 0
└ Reserved Memory: 0 B / 0 B
└ Labels:
└ Error: (none)
└ UpdatedAt: 2016–03–11T18:07:18Z
(unknown): 10.0.2.10:3376
└ Status: Pending
└ Containers: 0
└ Reserved CPUs: 0 / 0
└ Reserved Memory: 0 B / 0 B
└ Labels:
└ Error: (none)
└ UpdatedAt: 2016–03–11T18:07:18Z

There are documented API incompatibilities and Swarm has check to make sure that each node is actually an engine. But the API’s are fundamentally compatible. The Swarm API only adds information, minor data structure changes, and has different version information. Ideally, Swarm would behave like a proper building block and stack. It doesn’t leaving two options: fork Swarm or build an adapter.

Whale-Mullet — Docker in the Front, Swarm in the Back

A real solution would either make the API truly compatible or make the manager of Swarms capable of speaking the Swarm API. In this case I forked Swarm for a proof of concept and made a few adjustments to the response structures. In other words, I made a Swarm manager lie about being an engine. In my experience forking a project like Swarm is a bad idea to own long term, but it is handy for a demo.

Arrows indicate flow of control. Key-value stores are not pictured.

The changes in Whale-Mullet subvert API version checking by delegating to the local Docker engine for a few specific calls. It also aggregates cluster information and presents it as if it is a single engine.

Whale-Mullet is a quick hack that works as a proof of concept. Try it for yourself by replacing your local Swarm managers with the allingeek/whale-mullet image. You need to add two additional arguments to your standard Swarm manager startup:

docker -H $LOCAL_SWARM_MANAGER_A run -d \
--name manager_a --restart always \
-v /var/run/docker.sock:/var/run/docker.sock \
--privileged \

--net host \
allingeek/whale-mullet \
manage -H tcp://0.0.0.0:3376 \
--strategy spread \
etcd://10.0.0.10:2379

I started up a demo Swarm-of-Whale-Mullets in AWS and ran a few tests. The following screen captures show the operational federated cluster. The top-level Swarm manager is running on 10.0.0.22:3376; local Whale-Mullet cluster A on 10.0.0.20:3376; local Whale-Mullet cluster B on 10.0.0.21:3376.

Running “docker info” on one of the Whale-Mullet endpoints shows the typical output you’d see from a two node Swarm cluster. It is up and healthy. It has two (t2.micro) engines running on randomly assigned private IP addresses.

Note the IP addresses of the Whale-Mullet nodes as healthy members of the cluster.

Running “docker info” on the Swarm manager shows an opaque view of the world. Each member of the federated cluster looks like a single machine. Container and image statistics are appropriately rolled up. This proof of concept did not address resource reporting and so CPU and memory numbers are incorrect. A real implementation would need to fix those gaps.

You might be thinking, “Okay, docker info works, but does it run anything?” I’m happy to say that it does!

The image above shows a “docker run” command being issued to the top-level Swarm manager. It demonstrates that the streams are correctly redirected through both levels of abstraction. The subsequent “docker ps” command also demonstrates how Swarm’s node prefixing of container names stacks correctly onto a federated cluster.

I put this thing through a fairly complete functional test and most features worked great.

Is this silly? Is it awesome? I can’t tell…

As everyone gets used to orchestration platforms, distributed computing schedulers, and piling abstractions on top of abstractions I think cluster federation is an inevitable concern. Kubernetes has made real efforts toward federation and others will follow. Keeping clusters small and federating helps those platforms grow horizontally without abandoning the strong consistency they need for operation.

Whale-Mullet is absolutely silly. But it is an awesome demonstration of what can be accomplished with composable services. Swarm really isn’t that far off, but I’m not sure this is a direction Docker wants to take the project.

If you’re interested in learning more about Docker or Swarm please checkout my book, Docker in Action and help support my development of other articles like this one.