Subreptivus
Nov 30, 2016 · 11 min read

The purpose of this article is to describe service failover in Windows Docker Swarm cluster on separate hosts.
Don’t expect deep diving in technical explanations or work principles. I will use a very basic setup with a WebServer container (microsoft/iis:nanoserver) as a point of example.
In the end you should decide for yourself if the Docker Swarm Cluster on Windows at the present moment is mature enough to use in live environment.

TL;DR

Docker Swarm Mode (as an integrated part of Docker’s native clustering/orchestration solution) doesn’t work at the present moment on Windows.
Docker Swarm (as separate component of Container orchestration from 1.6 release) is working as expected.

Prerequisites

  • Three separate hosts or virtual machines with their associated IP addresses (in this lab these are swarm-manager [10.100.0.4], swarm-node-1-beta [10.100.0.12] and swarm-node-2-dev [10.100.0.7]).
  • All hosts running Docker version 1.12 and up within the same subnet with ports 2377, 4789 and 7946 open in addition to the SSH and RDP\WinRM.
  • The SSH password\key and RDP\WinRM credentials.
  • Internet access from all the hosts to Docker Hub.

Legend

In this Lab I will use following configuration:

  • Swarm Pool/Manager server [10.100.0.4] — Ubuntu 16.04.1 LTS with Docker version 1.12.3 and Swarm version 1.2.5.
  • Swarm Worker 1 [10.100.0.12] — Microsoft Windows Server 2016 Datacenter (10.0.14393) with Docker version 1.13.0-rc2 and Swarm version 1.2.5.
  • Swarm Worker 2 [10.100.0.7] — Microsoft Windows Server 2016 Datacenter (10.0.14393) with Docker version 1.12.2-cs2-ws-beta and Swarm version 1.2.5.
  • All the commands are marked with an IP Address and the Shell of host where they were executed. So “BASH” is Bash (Unix shell) and “PS”is PowerShell (Windows shell).

Later on I will explain why I’ve used different versions of Docker for Windows.

Prepare Manager Server

For the matter of simplicity I will use Ubuntu 16.04.1 LTS as the Swarm Pool/Manager server.

  • SSH into Ubuntu server — [10.100.0.4]
  • Install Docker on Ubuntu server

There is no need to rewrite official documentation.
Once it is installed change default daemon start configuration.

[10.100.0.4]:BASH $ sudo service docker stop
[10.100.0.4]:BASH $ sudo sed -i 's,^ExecStart=/usr/bin/dockerd.*$,ExecStart=/usr/bin/dockerd -H unix:// -H 0.0.0.0:2375,' /lib/systemd/system/docker.service
[10.100.0.4]:BASH $ sudo systemctl daemon-reload
[10.100.0.4]:BASH $ sudo service docker start
  • Get a Swarm ID

Swarm uses a distributed key:value pair to cluster hosts together. It ships with a simple scheduling backend out of the box, and uses Docker Hub as a hosted discovery service.
Using Docker Hub’s hosted discovery service requires that each node in the swarm is connected to the public internet.

Warning: The Docker Hub Hosted Discovery Service is not recommended for production use. It’s intended to be used for testing/development. See the discovery backends for production use.

Create the cluster.

[10.100.0.4]:BASH $ docker run --rm swarm create

Save that Token/Unique ID on the last line. (ie. ab17160f0e95a316792da776efe7d95a).

Prepare Swarm Nodes

  • Connect to the remote Windows Server.
  • Install Windows Updates

Critical updates are required in order for the Windows Container feature to function.
Ensure your Windows Server system is up-to-date by running:

[10.100.0.7]: PS > sconfig

This shows a text-based configuration menu, where you can choose option 6 to Download and Install Updates.
When prompted, choose option A to download all updates.
Reboot the system once the updates have been applied.

[10.100.0.7]: PS > Restart-Computer -Force

Once it is back up, re-establish the remote connection to Windows Server.

  • Install Docker

Docker is required in order to work with Windows containers. To install Docker we will use the OneGet provider PowerShell module. The provider will enable the containers feature on your machine and install Docker — this will require a reboot.

First we will install the OneGet PowerShell module.

[10.100.0.7]: PS > Install-Module -Name DockerMsftProvider -Repository PSGallery -Force

Next we will use OneGet to install the latest version of Docker.

[10.100.0.7]: PS > Install-Package -Name docker -ProviderName DockerMsftProvider

When the installation is complete, reboot the computer.

[10.100.0.7]: PS > Restart-Computer -Force

Once it is back up, re-establish the remote PowerShell connection.

  • Configure Firewall

For simplicity of testing we will disable firewall at all.

[10.100.0.7]: PS >netsh advfirewall set allprofiles state off

If you need to create a firewall rule on the swarm host for the Docker connection. This will be port 2377 for an unsecure connection.

It can be added to firewall rules like so:

PS >netsh advfirewall firewall add rule name="Docker Swarm daemon" dir=in action=allow protocol=TCP localport=2377

The default Engine and Swarm ports are:
Engine daemon — 2377/tcp


For the container overlay network — 4789/udp
Allows for discovering other container networks — 7946/tcp|udp

  • Configure the Docker Engine to accept incoming connection over TCP

First create a daemon.json file at c:\ProgramData\docker\config\daemon.json on the Nano Server host.

[10.100.0.7]: PS > new-item -Type File c:\ProgramData\docker\config\daemon.json

Next, run the following command to add connection configuration to the daemon.json file. This configures the Docker Engine to accept incoming connections over TCP port 2375. This is an unsecure connection and is not advised, but can be used for isolated testing. For more information on securing this connection, see Protect the Docker Daemon on Docker.com.

[10.100.0.7]: PS > Add-Content 'c:\programdata\docker\config\daemon.json' '{ "hosts": ["tcp://0.0.0.0:2375", "npipe://"] }'

Restart the Docker service.


[10.100.0.7]: PS > Restart-Service docker

For the other node (swarm-node-2-dev [10.100.0.12]) all the steps are the same with few additional.


[10.100.0.12]: PS > Invoke-WebRequest “https://test.docker.com/builds/Windows/x86_64/docker-1.13.0-rc2.zip" -OutFile “$env:TEMP\docker.zip” -UseBasicParsing
[10.100.0.12]: PS > Stop-Service docker
[10.100.0.12]: PS > Expand-Archive -Path “$env:TEMP\docker.zip” -DestinationPath $env:ProgramFiles -Force
[10.100.0.12]: PS > Start-Service docker
  • Install Base Container Images

Before working with Windows Containers, a base image needs to be installed. For detailed information on Docker container images, see Build your own images on docker.com.

To install the Nano Server base image run the following:

[10.100.0.7]: PS > docker pull microsoft/nanoserver

Please read the Windows Containers OS Image EULA which can be found here — EULA.

Configure Swarm Agents

Each host is going to act as a pool of resources for the cluster. Therefore, a swarm agent must be installed on each host.

There is no public Swarm binary for Windows. Therefore you can build your own from sources. Build it in so called “Docker way” or just use ready to use container from Stefan Scherer. Replace the string after token:// with the Swarm token from earlier.

[10.100.0.12]: PS > docker run -d --restart=always --name swarm-node-2-dev stefanscherer/swarm-windows:latest-nano join “--addr=10.100.0.12:2375” “token://ab17160f0e95a316792da776efe7d95a”[10.100.0.7]: PS > docker run -d --restart=always --name swarm-node-1-beta stefanscherer/swarm-windows:latest-nano join “--addr=10.100.0.7:2375” “token://ab17160f0e95a316792da776efe7d95a”

Configure Swarm Master

Now that each of our hosts are acting as resources for the pool, we have to have a manager of these resources. This manager will become the docker endpoint. This means we will redirect our docker engine commands to point to the swarm master. When we issue commands to create a new container, the swarm master is responsible for looking at the pool of resources and deciding where to place the container.
I will publish container port 2375 (Docker daemon port) to 3375, because Docker is already using default port as I’m running “Docker “ inside Docker.

Now run the Swarm Master container.

[10.100.0.4]:BASH $ docker run -d --restart=always --name swarm-manager -p 3375:2375 swarm manage token://ab17160f0e95a316792da776efe7d95a

Once completed the Swarm Master container can be accessed with the docker -H parameter. Like so docker -H tcp://127.0.0.1:3375.

For the simplicity of use, I will add the docker host parameter to the Environment Variables.

[10.100.0.4]:BASH $ export DOCKER_HOST=tcp://127.0.0.1:3375

The docker info command will display system wide information regarding the Docker.

[10.100.0.4]:BASH $ docker info
Containers: 2
Running: 2
Paused: 0
Stopped: 0
Images: 7
Server Version: swarm/1.2.5
Role: primary
Strategy: spread
Filters: health, port, containerslots, dependency, affinity, constraint
Nodes: 2
worker1–1–13–0-dev: 10.100.0.12:2375
└ ID: I265:UDUQ:3BQN:TQZV:PFCS:AUBB:TGB5:WXBY:NDO7:ETG7:VC6N:S2ZL
└ Status: Healthy
└ Containers: 1 (1 Running, 0 Paused, 0 Stopped)
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 1.05 GiB
└ Labels: kernelversion=10.0 14393 (14393.447.amd64fre.rs1_release_inmarket.161102–0100), operatingsystem=Windows Server 2016 Datacenter, storagedriver=windowsfilter
└ UpdatedAt: 2016–11–29T14:40:36Z
└ ServerVersion: 1.13.0-rc2
worker2–1–12–2-cs2-ws-beta: 10.100.0.7:2375
└ ID: QFTP:66XY:AX3Z:P64M:5AV5:OYJT:6NTE:LML3:SFUV:NC4T:7A6U:3LE2
└ Status: Healthy
└ Containers: 1 (1 Running, 0 Paused, 0 Stopped)
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 2.1 GiB
└ Labels: kernelversion=10.0 14393 (14393.447.amd64fre.rs1_release_inmarket.161102–0100), operatingsystem=Windows Server 2016 Datacenter, storagedriver=windowsfilter
└ UpdatedAt: 2016–11–29T14:40:30Z
└ ServerVersion: 1.12.2-cs2-ws-beta
Plugins:
Volume:
Network:
Swarm:
NodeID:
Is Manager: false
Node Address:
Security Options:
Kernel Version: 4.4.0–47-generic
Operating System: linux
Architecture: amd64
CPUs: 2
Total Memory: 3.149 GiB
Name: 3e6a688d23c5
Docker Root Dir:
Debug Mode (client): false
Debug Mode (server): false
WARNING: No kernel memory limit support

The docker ps command will output the result of showing every container running in cluster.

[10.100.0.4]:BASH $ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
86f816440329 stefanscherer/swarm-windows:latest-nano “\\swarm.exe join — ad” 2 minutes ago Up 2 minutes worker2–1–12–2-cs2-ws-beta/swarm-node-1-beta
277bc8eef8bd stefanscherer/swarm-windows:latest-nano “\\swarm.exe join — ad” 6 minutes ago Up 6 minutes worker1–1–13–0-dev/swarm-node-2-dev

Using Docker with Swarm

Now we have a two container hosts (Workers) combined with Swarm.
Then docker run command on the swarm master will run the WebServer container (microsoft/iis) on one of the Workers.

[10.100.0.4]:BASH $ docker run -d --restart=always --env reschedule:on-node-failure --env constraint:operatingsystem==”Windows Server 2016 Datacenter” --name WebServer --publish 80:80 microsoft/iis:nanoserver

Why some options were marked in bold?
--restart=always — Always restart the container regardless of the exit status.
--env reschedule:on-node-failure — Simple rescheduling strategy.
--env constraint:operatingsystem==”Windows Server 2016 Datacenter”— Runs container only on Windows nodes. Swarm Master is also counted as Worker, so it could try (using default spread strategy) to run the container on Ubuntu Server and will definitely fail, because microsoft/iis is the native Windows Container.

The docker ps command will output the result of showing every container running in cluster.

[10.100.0.4]:BASH $ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
eabedec1f658 microsoft/iis:nanoserver “C:\\ServiceMonitor.ex” 17 seconds ago Up 14 seconds 10.100.0.12:80->80/tcp worker1–1–14–0-dev/WebServer
86f816440329 stefanscherer/swarm-windows:latest-nano “\\swarm.exe join — ad” 2 minutes ago Up 2 minutes worker2–1–12–2-cs2-ws-beta/swarm-node-1-beta
277bc8eef8bd stefanscherer/swarm-windows:latest-nano “\\swarm.exe join — ad” 6 minutes ago Up 6 minutes worker1–1–14–0-dev/swarm-node-2-dev

Check if the site is actually reachable.

[10.100.0.4]:BASH $ curl http://10.100.0.12:80
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>IIS Windows Server</title>
<style type="text/css">
<!--
body {
color:#000000;
background-color:#0072C6;
margin:0;
}
#container {
margin-left:auto;
margin-right:auto;
text-align:center;
}
a img {
border:none;
}
-->
</style>
</head>
<body>
<div id="container">
<a href="http://go.microsoft.com/fwlink/?linkid=66138&amp;clcid=0x409"><img src="iisstart.png" alt="IIS" width="960" height="600" /></a>
</div>
</body>
</html>

Now trying to kill the process (Microsoft just using ServiceMonitor.exe binary to check if the process w3svc is still running) that runs inside container on respective node.

[10.100.0.12]: PS > Stop-Process -ProcessName ServiceMonitor -Force

Now checks docker ps again.

[10.100.0.4]:BASH $ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
eabedec1f658 microsoft/iis:nanoserver “C:\\ServiceMonitor.ex” 21 minutes ago Up 6 seconds 10.100.0.12:80->80/tcp worker1–1–14–0-dev/WebServer
86f816440329 stefanscherer/swarm-windows:latest-nano “\\swarm.exe join — ad” 22 minutes ago Up 22 minutes worker2–1–12–2-cs2-ws-beta/swarm-node-1-beta
277bc8eef8bd stefanscherer/swarm-windows:latest-nano “\\swarm.exe join — ad” 27 minutes ago Up 27 minutes worker1–1–14–0-dev/swarm-node-2-dev

What happened? We can see that everything left intact, except for the status “Up 6 seconds”, it’s the --restart=always responsibility.

In previous docker info output I’ve marked in bold the Health Status. It wasn’t changed because in Cluster perspective the node is Up and Running.

[10.100.0.4]:BASH $ docker info
Containers: 3
Running: 3
Paused: 0
Stopped: 0
Images: 7
Server Version: swarm/1.2.5
Role: primary
Strategy: spread
Filters: health, port, containerslots, dependency, affinity, constraint
Nodes: 2
worker1–1–13–0-dev: 10.100.0.12:2375
└ ID: I265:UDUQ:3BQN:TQZV:PFCS:AUBB:TGB5:WXBY:NDO7:ETG7:VC6N:S2ZL
└ Status: Healthy
└ Containers: 2 (2 Running, 0 Paused, 0 Stopped)
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 1.05 GiB
└ Labels: executiondriver=<not supported>, kernelversion=10.0 14393 (14393.447.amd64fre.rs1_release_inmarket.161102–0100), operatingsystem=Windows Server 2016 Datacenter, storagedriver=windowsfilter
└ UpdatedAt: 2016–11–29T15:07:32Z
└ ServerVersion: 1.13.0-rc2
worker2–1–12–2-cs2-ws-beta: 10.100.0.7:2375
└ ID: QFTP:66XY:AX3Z:P64M:5AV5:OYJT:6NTE:LML3:SFUV:NC4T:7A6U:3LE2
└ Status: Healthy
└ Containers: 1 (1 Running, 0 Paused, 0 Stopped)
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 2.1 GiB
└ Labels: kernelversion=10.0 14393 (14393.447.amd64fre.rs1_release_inmarket.161102–0100), operatingsystem=Windows Server 2016 Datacenter, storagedriver=windowsfilter
└ UpdatedAt: 2016–11–29T15:08:02Z
└ ServerVersion: 1.12.2-cs2-ws-beta
Plugins:
Volume:
Network:
Swarm:
NodeID:
Is Manager: false
Node Address:
Security Options:
Kernel Version: 4.4.0–47-generic
Operating System: linux
Architecture: amd64
CPUs: 2
Total Memory: 3.149 GiB
Name: 3e6a688d23c5
Docker Root Dir:
Debug Mode (client): false
Debug Mode (server): false
WARNING: No kernel memory limit support

Now we will stop the Docker service, to emulate that host is Down.

[10.100.0.12]: PS > Stop-Service docker

Now running the docker info again.What do we see on manager about health status?

[10.100.0.4]:BASH $ docker info
Containers: 3
Running: 3
Paused: 0
Stopped: 0
Images: 7
Server Version: swarm/1.2.5
Role: primary
Strategy: spread
Filters: health, port, containerslots, dependency, affinity, constraint
Nodes: 2
worker1-1-13-0-dev: 10.100.0.12:2375
└ ID: I265:UDUQ:3BQN:TQZV:PFCS:AUBB:TGB5:WXBY:NDO7:ETG7:VC6N:S2ZL
└ Status: Unhealthy
└ Containers: 1
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 1.05 GiB
└ Labels: executiondriver=<not supported>, kernelversion=10.0 14393 (14393.447.amd64fre.rs1_release_inmarket.161102-0100), operatingsystem=Windows Server 2016 Datacenter, storagedriver=windowsfilter
└ Error: Cannot connect to the Docker daemon. Is the docker daemon running on this host?
└ UpdatedAt: 2016-11-29T15:09:50Z
└ ServerVersion: 1.13.0-rc2
worker2-1-12-2-cs2-ws-beta: 10.100.0.7:2375
└ ID: QFTP:66XY:AX3Z:P64M:5AV5:OYJT:6NTE:LML3:SFUV:NC4T:7A6U:3LE2
└ Status: Healthy
└ Containers: 2 (2 Running, 0 Paused, 0 Stopped)
└ Reserved CPUs: 0 / 1
└ Reserved Memory: 0 B / 2.1 GiB
└ Labels: kernelversion=10.0 14393 (14393.447.amd64fre.rs1_release_inmarket.161102-0100), operatingsystem=Windows Server 2016 Datacenter, storagedriver=windowsfilter
└ UpdatedAt: 2016-11-29T15:09:52Z
└ ServerVersion: 1.12.2-cs2-ws-beta
Plugins:
Volume:
Network:
Swarm:
NodeID:
Is Manager: false
Node Address:
Security Options:
Kernel Version: 4.4.0-47-generic
Operating System: linux
Architecture: amd64
CPUs: 2
Total Memory: 3.149 GiB
Name: 3e6a688d23c5
Docker Root Dir:
Debug Mode (client): false
Debug Mode (server): false
WARNING: No kernel memory limit support

And what about processes?

[10.100.0.4]:BASH $ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9f5b85ed39eb microsoft/iis:nanoserver “C:\\ServiceMonitor.ex” 7 seconds ago Up 5 seconds 10.100.0.7:80->80/tcp worker2–1–12–2-cs2-ws-beta/WebServer
86f816440329 stefanscherer/swarm-windows:latest-nano “\\swarm.exe join — ad” 29 minutes ago Up 29 minutes worker2–1–12–2-cs2-ws-beta/swarm-node-1-beta

Here you can see that IP address and the actual node was changed for the WebServer container. Because the node that was previously running that container is Down for the Cluster perspective.

Let’s check if the site is still reachable.

$ curl http://10.100.0.7:80
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>IIS Windows Server</title>
<style type="text/css">
<!--
body {
color:#000000;
background-color:#0072C6;
margin:0;
}
#container {
margin-left:auto;
margin-right:auto;
text-align:center;
}
a img {
border:none;
}
-->
</style>
</head>
<body>
<div id="container">
<a href="http://go.microsoft.com/fwlink/?linkid=66138&amp;clcid=0x409"><img src="iisstart.png" alt="IIS" width="960" height="600" /></a>
</div>
</body>
</html>

Docker’s native Swarm integration from version 1.12

Now to the explanation why there were used development version of Docker on one of the hosts.

One of the big features in Docker 1.12 release is integrated Swarm mode. Naturally I wanted to try out all the fancy features that were so popular and discussed all over the Web.

Docker Swarm mode is fundamentally different from Swarm but shares the native Swarm functionality, this is done to preserve backward compatibility.
Service model provides features like scaling, rolling update, service discovery, load balancing and routing mesh.

When we try to use something from new feature list with publicly available Docker version for Windows — 1.12.2-cs2-ws-beta. Everything is smooth for the moment of actual run of the Service.

[10.100.0.4]:BASH $ docker service create --constraint node.labels.operatingsystem==windows --name WebServer --publish 80:80 microsoft/iis:nanoserver

It tries to run but fails on Worker with an error Unable to locate plugin: overlay. The problem here, that this version doesn’t has support of overlay network for Windows. Here is the official comment.

From the Docker Changelog we can see that Windows Server 2016 overlay network driver support was added in version 1.13.0. That is why I’ve tried to use Dev version.

Even with the statement in Changelog that it required Windows update. And the actual explanation in Pull Request that delivered overlay network driver support I had to try it out.

Unfortunately outcome was predictable, with the Docker version 1.13.0-rc2 to the present date the Service fails on network gateway creation with HNS failed with error : Catastrophic failure.

Сonclusion

If the feature list of Docker Swarm is not enough for you to use it in Windows environment, then there are two options.
To wait for Microsoft to release an Update with the overlay network driver support, and try the Docker Swarm Mode.
Or to check one of the competitors on the market, like Mesos or Kubernetes.

DevOops World … and the Universe

Answer to the Ultimate Question of Life, The Universe, and Everything in IT World

Subreptivus

Written by

DevOops World … and the Universe

Answer to the Ultimate Question of Life, The Universe, and Everything in IT World

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade