Docker Kata 001

>>> Into the Swarm

Welcome to a new series of “Docker Kata.” If you’re familiar with Agile software development, you’ve probably heard of CodeKata. I’d like to apply similar practices of practice and exploration in all the great new topics around Docker container virtualization.

You’ve probably heard the announcements from DockerCon 2016 about Docker’s new built-in orchestration.

Here we’ll begin to explore Docker Swarm along with the complimentary technology designed to leverage large-scale cluster orchestration. In this case, we’ll explore Docker’s new Swarm mode with the help of some local virtualization. You can run similar clusters on Digital Ocean, GCE, or AWS if you set the Docker Machine driver.

For this kata, make sure you have Docker machine running on your laptop, either through the Docker Toolkit, or if you have Docker Beta, you’ll just need to download Docker machine with the appropriate driver (OSX / Windows). Docker Beta will handle the upgrade automatically. If you’d like a shortcut script to getting this cluster spun up, check here.


Let’s get started.

Create a couple of Virtualbox powered nodes for the swarm. Here’s a quick one-liner,

$ for i in master node01 node02; do docker-machine create -d virtualbox $i; done

or a full set of commands:

$ docker-machine create -d virtualbox master
Running pre-create checks…
Creating machine…

$ docker-machine create -d virtualbox node01
Running pre-create checks…
Creating machine…

$ docker-machine create -d virtualbox node02
Running pre-create checks…
Creating machine…

This will take a minute or so, depending on the speed of your laptop, to finish completely. Let’s check to make sure that your Docker VM hosts have booted up

$ docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS
master — virtualbox Running tcp://192.168.99.103:2376 v1.12.0
node01 — virtualbox Running tcp://192.168.99.104:2376 v1.12.0
node02 — virtualbox Running tcp://192.168.99.105:2376 v1.12.0

Once those VMs are running in Virtualbox, you’ll have three Docker engines at your disposal. We’ll use the Docker engines running on those VMs to create our swarm cluster.

First, let’s create the swarm mode master. We’ll use the Docker engine’s ability to run remotely using the `docker-machine config`as a variable. You’ll notice here as well that instead of switching your environmental variables around, we’ll use Bash variables and Docker Machine config to manipulate our servers.

$ docker $(docker-machine config master) swarm init \
--advertise-addr $(docker-machine ip master):2377

Next, let’s use the second machine to join the node as a worker. We’re using the config as a variable to execute the Docker command remotely again, as we did with the master node. We’ll add in token extraction, which means that we’ll be executing a Docker engine command inside of a command, using backticks,

`docker $(docker-machine config master) swarm join-token worker -q`

to our launch command.

$ docker $(docker-machine config node01) swarm join \
--token `docker $(docker-machine config master) swarm join-token worker -q` \
$(docker-machine ip master):2377
$ docker $(docker-machine config node02) swarm join \
--token `docker $(docker-machine config master) swarm join-token worker -q` \
$(docker-machine ip master):2377

Running `docker swarm join` accomplishes several things.

It sets the Docker engine on the node to swarm mode. It also requests a TLS certificate from the manager, after which it sets the node name to the virtual machine’s hostname, and joins the node to the swarm using the swarm token and sets it to “Active” availability. Swarm also inserts the node into the pre-existing ingress overlay network of the swarm.

You can see that network by issuing the `docker network ls` command

$ docker $(docker-machine config master) network ls
NETWORK ID NAME DRIVER SCOPE
3d7f73f718e2 bridge bridge local
bd2c68b0b740 docker_gwbridge bridge local
a9ab72d43cb4 host host local
3n6c58p3qjyu ingress overlay swarm
ca58df367644 none null local

You’ll notice the docker_gwbridge network, which allows the containers to have external connectivity outside of their cluster.

If you accidentally use the incorrect token, and join your worker node as another master you’ll find that you’ll have to destroy and re-create your cluster due to how the Raft protocol works. Here’s an example of what this looks like. Note that both nodes have a status under “Manager.”

$ docker $(docker-machine config master) node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
1f1be48ataio89evarv5xbk0y * master Ready Active Leader
a4e48u7wrtfmy935b98wf32yj node01 Ready Active Reachable

You can try stripping the cluster back down to a single node, but you’ll see the following error.

$ docker $(docker-machine config node01) swarm leave
Error response from daemon: You are attempting to leave cluster on a node that is participating as a manager. Leaving the cluster will leave you with 1 managers out of 2. This means Raft quorum will be lost and your cluster will become inaccessible. The only way to restore a cluster that has lost consensus is to reinitialize it with ` — force-new-cluster`. Use ` — force` to ignore this message.

Let’s leave the cluster, initialize a new cluster on the remaining manager, and rejoin the node. You’ll need to use ` — force-new-cluster` to reinitialize.

$ docker $(docker-machine config node01) swarm leave --force
$ docker $(docker-machine config master) swarm init --force-new-cluster --advertise-addr $(docker-machine ip master):2377
$ docker $(docker-machine config node01) swarm join \
--token `docker $(docker-machine config master) swarm join-token worker -q` \
$(docker-machine ip master):2377

Let’s also set up ManoMark’s visualizer, so we can observe what’s going on with our cluster. We’ll start it on the master node, and set it to a non-conflicting port with our upcoming webapp service.

$ docker $(docker-machine config master) run -it -d -p 5000:5000 \
-e HOST=`docker-machine ip master` \
-e PORT=5000 \
-v /var/run/docker.sock:/var/run/docker.sock \
manomarks/visualizer
A fresh visualizer

That container is going to be running on the master node, so let’s point our Docker client at the master’s Docker engine, and check to see what ports that container is advertising.

$ docker $(docker-machine config master) port `docker $(docker-machine config master) ps -ql`
5000/tcp -> 0.0.0.0:5000

We can now check and see the cluster status. Let’s set ourselves another bash variable in order to save some typing.

$ master=$(docker-machine config master)
$ docker $(master) node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
1v1pbijat414p1clwynrsjgga node01 Down Active
6m1y28qnhagsmaqdg1i74mgz4 * master Ready Active Leader
99eyb7120373kfic8kcjct7kr node02 Ready Active
9wdiv6bf4t4sj4p68hjvcxebh node01 Ready Active

Excellent! Let’s also remove the broken node.

$ docker $(docker-machine config master) node rm a4e48u7wrtfmy935b98wf32yj
$ docker $master node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
6m1y28qnhagsmaqdg1i74mgz4 * master Ready Active Leader
99eyb7120373kfic8kcjct7kr node02 Ready Active
9wdiv6bf4t4sj4p68hjvcxebh node01 Ready Active

Once we’re done switching back and forth between Docker engines, let’s set our configuration to work with the master node.

$ eval $(docker-machine env master)

We’ll create our first service, a locally running example of the game 2048. We’ll use alexwhen’s excellent example on Docker Hub.

$ docker service create — replicas 1 — name hello2048 -p 8888:80 alexwhen/docker-2048

You’ll be able to see a single container running on the visualizer, which is running at the IP address of your master container + port 8080. In my case, I navigate to http://192.168.99.117:8080/ in my web browser.

Our first service running, with replicas set to 1 container.

Let’s also take a look at the command line.

$ docker service ls
ID NAME REPLICAS IMAGE COMMAND
d6a1dwgl3cjk hello2048 1/1 alexwhen/docker-2048

You should see this running, if the 2048 app and service has launched successfully.

Let’s scale up the number replicas from a single container to 9,

$ docker service scale hello2048=9

and then check on the how many scaling tasks resulted

$ docker service ps hello2048
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR
7rsjsw7kr8u63pz9cjmya0hbd hello2048.1 alexwhen/docker-2048 master Running Running 6 minutes ago
0jo2ow301gzafuctftjbltb5j hello2048.2 alexwhen/docker-2048 node01 Running Running 6 seconds ago
dzmdqobz3s7ezn8uiitkuteby hello2048.3 alexwhen/docker-2048 node02 Running Running 6 seconds ago
5dbc2uhvxejmqyp8epfhht4k2 hello2048.4 alexwhen/docker-2048 node01 Running Running 5 seconds ago
8yen10t3kkm4gni2a4iq617om hello2048.5 alexwhen/docker-2048 node01 Running Running 6 seconds ago
ezlsc0yukwvcjlk2jpmfxl8u7 hello2048.6 alexwhen/docker-2048 master Running Running 8 seconds ago
0er0z5lzoxwwghy2shfn8czbs hello2048.7 alexwhen/docker-2048 node01 Running Running 5 seconds ago
bu2r7jw0bijoelq2d6m79e3lp hello2048.8 alexwhen/docker-2048 master Running Running 8 seconds ago
cm5z908p6zg777a23eyaqtsmu hello2048.9 alexwhen/docker-2048 node02 Running Running 6 seconds ago

The visualizer should represent a similar scale out.

Scale out to 9 containers!

If you want, kill one of the containers in the cluster, and watch Docker Swarm handle the failure. You can kill the container directly, or through the service name. I’ll use the container name here. If you’re quick, you can switch over to the visualizer and see the container get killed and respawn.

$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6cfc9b50e352 alexwhen/docker-2048:latest “nginx -g ‘daemon off”
About a minute ago Up About a minute 80/tcp hello2048.5.dwpgf0d0j6cmcintatpwnhor2
a189161e10b4 alexwhen/docker-2048:latest “nginx -g ‘daemon off” 4
minutes ago Up 4 minutes 80/tcp hello2048.9.8wpa550b7qa6bq9dk7j745sy4
7892d4774f52 alexwhen/docker-2048:latest “nginx -g ‘daemon off” 5
minutes ago Up 5 minutes 80/tcp hello2048.1.40avij8crwrrpkla4jh32g03w
f661a229eec6 manomarks/visualizer “npm start” 13 minutes ago Up 13 minutes 0.0.0.0:5000->5000/tcp, 8080/tcp adoring_noyce
$ docker rm -f 770b1ffb7c9a

Let’s watch!

Once you kill the container, the visualizer will show the container being removed, and then reprovisioned. Now with the resulting service list, you can see the new container described by the `docker service` command.

$ docker service ps hello2048
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR
40avij8crwrrpkla4jh32g03w hello2048.1 alexwhen/docker-2048 master Running Running 4 minutes ago
6z9puq0erqfno9ok7qe762fz9 hello2048.2 alexwhen/docker-2048 node02 Running Running 3 minutes ago
03csvhbl194o3ejhqyep1lyzj hello2048.3 alexwhen/docker-2048 node02 Running Running 3 minutes ago
0rbuvnz5z2umcaqs3kzhphk23 hello2048.4 alexwhen/docker-2048 node01 Running Running 3 minutes ago
dwpgf0d0j6cmcintatpwnhor2 hello2048.5 alexwhen/docker-2048 master Running Running less than a second ago
dt4fd2nz9kam5e23z57sps7wd \_ hello2048.5 alexwhen/docker-2048 master Shutdown Failed 5 seconds ago “task: non-zero exit (137)”
3kcb4m6u58il2vugw2a8u0tvx hello2048.6 alexwhen/docker-2048 node02 Running Running 3 minutes ago
a82xv3677p1s8i7xm6se26uyo hello2048.7 alexwhen/docker-2048 node01 Running Running 3 minutes ago
0ige39agf50yfl43l6ow6szea hello2048.8 alexwhen/docker-2048 node01 Running Running 3 minutes ago
8wpa550b7qa6bq9dk7j745sy4 hello2048.9 alexwhen/docker-2048 master Running Running 3 minutes ago

Once we’re finished, we can shut the service down.

$ docker service rm hello2048

Based on these explorations, you should have a good basic understanding of Docker Swarm. Soon, we’ll explore node draining, rolling updates, and distributed application bundles. See you next time!

Thanks for reading.