Deploying Docker Swarm with Ansible

Create and manage a Docker Swarm cluster using Ansible.

The release of Docker 1.12 introduced a number of improvements, one of which is a much simpler way to create a Swarm cluster from scratch. The simplified process is roughly as follows:

  1. Initializing the first Swarm manager node (this will create a token used when joining other hosts).
  2. Adding additional manager and worker nodes to the cluster using the token from the step above.

Pretty simple, huh? While this certainly makes the process of manually bootstrapping a cluster easier, most of us are looking for ways to automate this process. In this guide, we’ll be using Ansible to fully automate the process of bootstrapping a new cluster as well as adding new hosts to the cluster using the same playbook.

Setup

The only things you’ll need are hosts with a Docker daemon running and a functioning Ansible environment with your Docker hosts added to an inventory. There is a lot of great documentation for getting this setup for those new to Ansible.

This playbook makes the assumption that you have certain hosts dedicated as Swarm managers belonging to an Ansible inventory group ‘manager’ and the rest as workers belonging to the Ansible inventory group named ‘worker’.

Playbook

I’ve called the playbook swarm.yml. Before we get into the weeds, the playbook is divided into the following high-level steps:

  1. Determine the Swarm status of each manager node and classify them as either “operational” or needing “bootstrap”ed.
  2. Do the same status check and classification for each worker node.
  3. If no manager is running in Swarm mode, then it must be a new cluster needing bootstrapped. Take a single manager node and run the docker swarm init command, creating a new cluster.
  4. Retrieve the manager and worker Swarm join tokens from a single, operational manager node.
  5. Join any manager nodes that are not currently apart of the cluster using the manager join token.
  6. Do the same for worker nodes using the worker join token.

Determine the Status of the Cluster

From the CLI, you can determine if a Docker daemon is running in Swarm mode by running docker info:

$ docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
... snip ...
Swarm: inactive # <<< this

Here we see that Docker is not running in Swarm mode and not apart of any cluster. When a Docker is running in Swarm mode, it will look like the following:

$ docker info
Containers: 0
Running: 0
Paused: 0
Stopped: 0
... snip ...
Swarm: active
NodeID: 7x3tddrjpmvcjllxxlujldck6
Is Manager: true
ClusterID: ajbtwndmq08i6usbau3fzsbhd
Managers: 1
Nodes: 1
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Heartbeat Tick: 1
Election Tick: 3
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Node Address: 10.0.0.2

The following play will run this command on each manager node to see if it is currently participating in the cluster and classify them into two groups: operational and bootstrap:

- hosts: manager
become: true
tasks:
- name: determine swarm status
shell: >
docker info | egrep '^Swarm: ' | cut -d ' ' -f2
register: swarm_status

- name: create swarm_manager_operational group
add_host:
hostname: "{{ item }}"
groups: swarm_manager_operational
with_items: "{{ play_hosts }}"
when: "'active' in swarm_status.stdout_lines"
run_once: true

- name: create swarm_manager_bootstrap group
add_host:
hostname: "{{ item }}"
groups: swarm_manager_bootstrap
with_items: "{{ play_hosts }}"
when: "'active' not in swarm_status.stdout_lines"
run_once: true

Here the add_host module is used to create ad hoc Ansible groups that will be used in later plays performing tasks/roles specific to that group. I like this pattern over a series of when statements in a role since you get to operate based on state facts about each group. For example, we know the hosts in the swarm_manager_operational group are running in Swarm mode because we determined their state using the docker info command.

Next, we are going to do the exact same thing for the worker hosts:

- hosts: worker
become: true
tasks:
- name: determine swarm status
shell: >
docker info | egrep '^Swarm: ' | cut -d ' ' -f2
register: swarm_status

- name: create swarm_worker_operational group
add_host:
hostname: "{{ item }}"
groups: swarm_worker_operational
with_items: "{{ play_hosts }}"
when: "'active' in swarm_status.stdout_lines"
run_once: true

- name: create swarm_worker_bootstrap group
add_host:
hostname: "{{ item }}"
groups: swarm_worker_bootstrap
with_items: "{{ play_hosts }}"
when: "'active' not in swarm_status.stdout_lines"
run_once: true

Notice that the only thing that has changed are the hosts we are targetting and the group we are putting them into. They are placed in a different group so we can join them to the cluster using a different token.

Optionally Bootstrap a Cluster

When all of the managers are not running in Swarm mode (Swarm: inactive), then a new cluster needs to be created. The init command only needs to be run once on a single node. If you try and run it again in an active cluster, it will fail as the node is already running in Swarm mode.

- hosts: swarm_manager_bootstrap[0]
become: true
tasks:
- name: initialize swarm cluster
shell: >
docker swarm init
--advertise-addr={{ swarm_iface | default('eth0') }}:2377
when: "'swarm_manager_operational' not in groups"
register: bootstrap_first_node

- name: add initialized host to swarm_manager_operational group
add_host:
hostname: "{{ play_hosts[0] }}"
groups: swarm_manager_operational
when: bootstrap_first_node | changed

This play begins by targeting the first node in the swarm_manager_bootstrap group. The first task runs the init command when the swarm_manager_operational group is empty, or put literally, not present in the available Ansible groups. The bootstrap_first_node variable is registered specifically for the next task. If the init task is run and a new cluster is bootstrapped (bootstrap_first_node | changed), then add that node into the group of operational Swarm managers (swarm_manager_operational).

This play is idempotent, meaning these will only execute when there are no hosts in the swarm_manager_operational group.

Retrieve the Join Tokens

The next play also only targets a single node in the swarm_manager_operational group to retrieve and register the manager and worker tokens required when joining a node to the cluster. Later in the playbook, we will need easy access to the IP address of one of the manager hosts, so we can store that information by creating another ad hoc group with just the manager’s IP (there are certainly other ways of storing and accessing a single manager’s IP address, I just think this is easier).

- hosts: swarm_manager_operational[0]
become: true
vars:
iface: "{{ swarm_iface | default('eth0') }}"
tasks:
- name: retrieve swarm manager token
shell: docker swarm join-token -q manager
register: swarm_manager_token

- name: retrieve swarm worker token
shell: docker swarm join-token -q worker
register: swarm_worker_token

- name: populate list of manager ips
add_host:
hostname: "{{ hostvars[item]['ansible_' + iface]['ipv4']['address'] }}"
groups: swarm_manager_ips
with_items: "{{ play_hosts }}"

Join Manager Nodes

The manager hosts that are not running in Swarm mode (swarm_manager_bootstrap group) will now be added to the cluster using the Swarm manager token.

- hosts: swarm_manager_bootstrap:!swarm_manager_operational
become: true
vars:
token: "{{ hostvars[groups['swarm_manager_operational'][0]]['swarm_manager_token']['stdout'] }}"
tasks:
- name: join manager nodes to cluster
shell: >
docker swarm join
--advertise-addr={{ swarm_iface | default('eth0') }}:2377
--token={{ token }}
{{ groups['swarm_manager_ips'][0] }}:2377

The hosts line may look a little odd since you would expect that any hosts in the swarm_manager_operational would not be in the swarm_manager_bootstrap. The reason this exclusion exists is to prevent the case when creating a new cluster, a host is first classified as needing bootstrapped and after having started a new cluster, it gets added to the swarm_manager_operational group.

The manager token is then accessed and used for joining the manager nodes to the cluster using the docker swarm join command.

Join Worker Nodes

The final play looks very similar to the previous, except targeting the worker nodes and joining using the worker token.

- hosts: swarm_worker_bootstrap
become: true
vars:
token: "{{ hostvars[groups['swarm_manager_operational'][0]]['swarm_worker_token']['stdout'] }}"
tasks:
- name: join worker nodes to cluster
shell: >
docker swarm join
--advertise-addr={{ swarm_iface | default('eth0') }}:2377
--token={{ token }}
{{ groups['swarm_manager_ips'][0] }}:2377

Using the shell module and searching for a value in the STDOUT works in a pinch, but may be unreliable, especially if the output format ever changes. Instead, we can make use of an Ansible modules to inject the Docker info as facts into the environment. This gives much greater control and predictability over how information is evaluated later.

Previously we used the following shell statement for determining if hosts are running in Swarm mode:

- hosts: manager
become: true
tasks:
- name: determine swarm status
shell: docker info | egrep '^Swarm: ' | cut -d ' ' -f2
register: swarm_status

We can instead use a module to inject the same Docker info as facts (in dictionary form):

#!/usr/bin/env python
# Copyright 2016, This End Out, LLC.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

DOCUMENTATION = """
---
module: docker_info_facts
short_description:
- A module for injecting Docker info as facts.
description:
- A module for injecting Docker info as facts.
author: nextrevision
"""

EXAMPLES = """
- name: load docker info facts
docker_info_facts:
"""

try:
from docker import Client
except:
docker_lib_missing = True
else:
docker_lib_missing = False


def _get_docker_info():
try:
cli = Client()
return cli.info(), False
except Exception as e:
return {}, e.message


def main():
module = AnsibleModule(
argument_spec=dict(),
supports_check_mode=False
)

if docker_lib_missing:
msg = "Could not load docker python library; please install docker-py"
module.fail_json(msg=msg)

info, err = _get_docker_info()

if err:
module.fail_json(msg=e.message)

module.exit_json(
changed=True,
ansible_facts={'docker_info': info})


from ansible.module_utils.basic import *
if __name__ == '__main__':
main()

This module makes use of the docker-py library to programmatically access the Docker info and inject it as a dictionary fact. Using this module, our play will now look like:

- hosts: manager
become: true
tasks:
- name: load docker info as facts
docker_info_facts:

- name: create swarm_manager_operational group
add_host:
hostname: "{{ item }}"
groups: swarm_manager_operational
with_items: "{{ play_hosts }}"
when: "'{{ hostvars[item]['docker_info']['Swarm']['LocalNodeState'] }}' == 'active'"
run_once: true

- name: create swarm_manager_bootstrap group
add_host:
hostname: "{{ item }}"
groups: swarm_manager_bootstrap
with_items: "{{ play_hosts }}"
when: "'{{ hostvars[item]['docker_info']['Swarm']['LocalNodeState'] }}' != 'active'"
run_once: true

Now we have a much more reliable way of determining the state of our hosts. See the repo below for the full playbook and updates using this module.

Conclusion

This playbook is idemptotent and can be run to initially bootstrap a cluster as well as add capacity to it. When a new host appears in your inventory under either the manager or worker group, it will be joined to the cluster.

The full project files can be found at github.com/nextrevision/ansible-swarm-playbook.


Originally published at thisendout.com on September 13, 2016.