Image for post
Image for post

Toward a Production-Ready Docker Swarm Cluster with Consul

Wherein I explore the requirements of, and develop a repeatable process for, standing up a moderately opinionated, production-ready Docker cluster using the community standard Engine, Machine, Swarm and Compose. HashiCorp Consul will be used as the key-value store for Swarm as well as providing a common discovery mechanism across all nodes.

Bleeding Edge?

When overlay networking was released with Docker 1.9 and Swarm 1.0 I noticed a mini explosion of articles describing how to setup Swarm clusters to leverage this excellent new feature, e.g.

These articles were pretty good introductions to clustering with Docker and Swarm but they were just that, introductions. I found myself searching for more information on the underlying key-value stores, that Swarm has an external dependency on via docker/libkv, and always came up short. What about high-availability and/or resiliency? Does the KVS actually need to be external from the cluster? What are the most common best-practices/assumptions that one should emulate when standing up a KVS and/or Swarm cluster? How to stand-up clusters in a repeatable fashion?

Welp, here goes!

Guiding Principles

  • Highly available and fault-tolerant key-value store
  • Highly available Swarm masters
  • Fully functional overlay networking
  • Repeatable, automation-ready setup
  • All cluster and node services delivered as containers
  • Smarter-than-default logging
  • Memory accounting configured in kernel
  • Secure communication between Consul nodes

If you care to follow along I have made the sources available on GitHub.

Architectural Assumptions

Inspired by the single-node diagram at Docker’s Engine Overview page, our setup will look something like this:

Consul will provide the KVS that under-girds our cluster. We will be running Consul agents on every node in the cluster which means that we can have a consistent, node-local address to provide to the Docker Engine/Swarm running on each node. We will also be leveraging Consul’s DNS support to enable service-discovery via SRV records served to clients within the cluster as well as to clients on the cluster edge.

As I have attempted to show in the diagram above, Consul is deployed as a common pillar across the entire stack. This means that service discovery is bootstrapped into the cluster from the ground up. How is this achieved? By pointing all containers, as well as the Docker daemon, to Consul for DNS resolution. Yes, you read that right: even the Docker daemon uses Consul to resolve, via multiple SRV records returned per query, its necessary key-value store. No need for an external KVS when it is fully baked into the cluster and addressable in the exact same way from every node within.


The concepts presented here were initially developed with a series of Makefiles and utilized Docker 1.9, Compose 1.5, and Machine 0.5. I have since adapted them into much simpler Bash scripts which were tested with Docker 1.10, Compose 1.6, and Machine 0.6. Please install these versions (or newer). Please also create an API access token at Digital Ocean and add it to your environment, e.g.

Additionally, you will need to setup your MACHINE_STORAGE_PATH to the directory that you cloned into. I have supplied a .bashrc that can be sourced to set this up for you:

Provisioning Phase One

Before we can build an overlay-ready cluster with built-in service discovery we need some nodes to commandeer via the Docker Machine generic driver. This first pass in effect pre-allocates all of the nodes, and hence IP addresses, we will need to be able to stand-up our multi-master Consul/Swarm combo cluster via an automated script. Running the provided script will get you Bash source-able output that looks like:

Provisioning Phase Two

The script leverages the output of to commandeer the nodes setup therein. The meat of which, is below:

But how can this work? I see an apparent reference to Consul but it hasn’t yet been installed! As I have chosen to use the tools provided by Docker I am somewhat bound by their inadequacies — I am looking at you @DockerMachine:

Fortunately, the Docker daemon will happily retry to connect to the cluster-store aka the KVS every so often, this gives us time to underlay it via Docker Compose.

At this point you might imagine that once the Consul masters achieve quorum and cluster that the Swarm is good to go. Not quiet yet. The reason for this is that the Swarm masters are defaulting to the Google resolvers (because Machine gives no hooks for injecting the dns options for the Swarm master/agent containers during node “creation”) which is preventing them from successful look-ups of Consul from … Consul.

An alternate way to solve this would be to recreate the swarm master/agents via composition, making sure to pass in the appropriate dns options to all containers. This is more attractive and I plan to tackle it once I have an automated way to convert a run-time inspection, e.g. `docker inspect swarm-agent-master` of container(s) to a composition, aka docker-compose.yml.

Within a minute or two of completing the provisioning for master nodes you should be able to point your Docker client to any of the Swarm masters:

After provisioning non-master nodes you should see them as participating in the Swarm within about a minute, but usually faster.

Room to Grow

The cluster as presented works fairly well. I can reboot some or all of the nodes and it will re-cluster without any manual intervention although sometimes with a good deal of patience while waiting for the various underlying services to perform their retries and sync up. Much like Michael Abrash, however, I am a big fan of computers performing their tasks near instantaneously. How to speed this up? The biggest gains are likely to be had by eliminating the retries that the Docker daemon is performing while establishing connections first to the KVS and then to the Swarm … which also waits on the KVS. Yeah, we have an order-of-operations issue.

Stupid Docker Tricks, Docker-in-Docker

The first feasible idea I had was to just use RancherOS and run Consul as a system service. This would make it run as a peer to the User Docker as everything in RancherOS is a container, with PID 1 being the System Docker. The fine folks at Rancher Labs are poring a lot of know-how into RancherOS but it is not yet ready for production nor is it on the small list of approved operating systems where I work. Why cannot such a setup be approximated?


There are a few well understood issues with running Docker-in-Docker that result in either degraded performance and/or outright corruption of data. Jérôme Petazzoni put together a nice, informative article discussing the how tos and why fors. Provided there are no other pitfalls, this should be doable.


We have a first cut at a Docker Cluster that is ready for some “real” work. There are some pain points during startup and recovery of nodes but patience will get you past these. We also have an interesting proposal for eliminating said pain points.

On Docker

Tangential thoughts and conversational notes about Docker…

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store