Docker Cloud’s Swarm mode feature

TL;DR

Docker Cloud makes it really easy to deploy a Swarm on Amazon AWS or Microsoft Azure. After deploying the well known Voting App on a Swarm created on AWS, we will attach a domain name to the cluster and then setup a TLS termination using the great Traefik reverse proxy.

About Docker Cloud

Docker Cloud is the 100% web based CaaS (Container as a Service) solution hosted by Docker.

There are 2 modes to operate this platform:

  • “legacy”: it allows to manage single host or cluster but does not involves Swarm
  • Swarm mode: still in beta, it allows to manage native Docker Swarm clusters (currently free while in beta :) ).

In this article, we will activate the Swarm mode switch from Docker Cloud’s interface.

Docker Cloud legacy mode vs Swarm mode

Setup the link with AWS

In order to create resources on AWS, we need to link our Docker account with the AWS one. This is done from the Service Providers menu on the left.

Note: AWS and Microsoft Azure are the 2 options available at the date of this writing (august 2017).

If we click on the plug icon in the Amazon Web Services menu, a popup appears. This one requests an ARN (Amazon Resource Name) to identify the account.

In order to get this ARN, we must follow the instructions provided by Amazon as detailed below.

We will not go into all the whole process here but basically we need to create a new role and assign a policy to this one. The ARN of this role is the one to provide to Docker Cloud.

Creation of the Swarm

There are 2 ways to manage a Swarm through Docker Cloud, we can

  • link Docker Cloud to an existing Swarm
  • create a new Swarm

The interface below shows both options.

In this article we will create a Swarm on AWS. The minimum information which needs to be provided are the following:

  • selection of the cloud provider
  • selection of the region
  • size of the Swarm: we will go for a little 3 nodes cluster. We will define each node as a master to have a production like setup. By default, a master node is also a worker.

We keep the default values for the other fields (such as the type of the node and the storage size). We also leave the CloudWatch option checked so we can have all the services’ log directly in AWS.

Note: if you do not have any key pairs, you will need to create one first. This will allow you to connect to the nodes through ssh if you need to.

After a couple of minutes, the time for the VM to boot up and for the Swarm to be setup, we have a working cluster.

Behind the hood

As our Swarm is created on AWS, we benefit from the close integration between Docker and Amazon. Several pieces of the underlying infrastructure are automatically setup for us, among them:

  • VPC
  • Load Balancer
  • security groups

The following screenshots show the EC2 instances created and the Load Balancer balancing traffic between those instances.

Note: if you need to ssh into one node of the cluster, you need to use the user named docker.

$ ssh -i PATH_TO_SSH_KEY docker@NODE_EXTERNAL_DNS
Welcome to Docker!

Accessing the Swarm

Once the swarm is created, we can click on its name from the Docker Cloud interface. From there, there are 2 ways to access it.

  • run a container locally based on the dockercloud/client image

Once the container is running, we need to login using our Docker Cloud credentials and export an environment variable indicating the URL the Docker client needs to use to talk to the Swarm. Behind the hood, a port is opened on the local machine and the traffic forwarded to the Swarm.

  • using directly Docker for Mac or Docker for Windows to connect to the Swarm from the Desktop.

If we click on the name of our Swarm, a new terminal is launched. The configuration of this one allows to send commands to the Swarm (through the Load Balancer).

Note: in the screenshot above, we can notice the command docker node ls does not show the exact same results each time: the node targeted is not the same between the calls. This behavior results in the usage of the Load Balancer setup for us while the Swarm was created. Each request is sent to one manager in a round-robin way.

Deploying an application

Now our Swarm is ready, we will deploy the well know Voting App on it.

We will use the docker-stack.yml file available from the public GitHub repository. We will just modified the definition of the networks to they are flagged as external. Doing so allows to create the networks before hand and then attach the services of the application.

version: "3"
services:

redis:
image: redis:alpine
ports:
- "6379"
networks:
- frontend
deploy:
replicas: 1
update_config:
parallelism: 2
delay: 10s
restart_policy:
condition: on-failure
db:
image: postgres:9.4
volumes:
- db-data:/var/lib/postgresql/data
networks:
- backend
deploy:
placement:
constraints: [node.role == manager]
vote:
image: dockersamples/examplevotingapp_vote:before
ports:
- 5000:80
networks:
- frontend
depends_on:
- redis
deploy:
replicas: 2
update_config:
parallelism: 2
restart_policy:
condition: on-failure
result:
image: dockersamples/examplevotingapp_result:before
ports:
- 5001:80
networks:
- backend
depends_on:
- db
deploy:
replicas: 1
update_config:
parallelism: 2
delay: 10s
restart_policy:
condition: on-failure

worker:
image: dockersamples/examplevotingapp_worker
networks:
- frontend
- backend
deploy:
mode: replicated
replicas: 1
labels: [APP=VOTING]
restart_policy:
condition: on-failure
delay: 10s
max_attempts: 3
window: 120s
placement:
constraints: [node.role == manager]

visualizer:
image: dockersamples/visualizer:stable
ports:
- "8080:8080"
stop_grace_period: 1m30s
volumes:
- "/var/run/docker.sock:/var/run/docker.sock"
deploy:
placement:
constraints: [node.role == manager]

networks:
frontend:
external: true
backend:
external: true


volumes:
db-data:

Creation of the networks

[swarm01] ~ $ docker network create --driver overlay frontend
[swarm01] ~ $ docker network create --driver overlay backend

Deployment of the stack

We can now deploy the application through the stack file.

[swarm01] ~ $ docker stack deploy -c docker-stack.yml voting
Creating network voting_default
Creating service voting_worker
Creating service voting_visualizer
Creating service voting_redis
Creating service voting_db
Creating service voting_vote
Creating service voting_result

We can then check if the stack was correctly created and list the services available on the Swarm.

If we come back to our AWS management console, there is an interesting thing to note: the ports exposed by the application (5000, 5001, 8080) were opened automatically into the Load Balancer (AWS ELB). Previously, only the port 7 (ICMP which allows a ping) and 2376 (Swarm management) were opened to the outside.

From the outside, using the DNS name of the Load Balancer, we can access the different web interfaces of the Voting App:

  • 5000: voting interface
  • 5001: result interface
  • 8080: visualizer which provides a clean visualisation of the repartition of the tasks across the nodes

Pointing a domain name to our Swarm

In this part, I have created a couple of subdomains (vote, result and viz) and defined CNAME so they point towards the DNS name of the Load Balancer.

result 10800 IN CNAME swarm01-XXX.eu-west-1.elb.amazonaws.com.
viz 10800 IN CNAME swarm01-XXX.eu-west-1.elb.amazonaws.com.
vote 10800 IN CNAME swarm01-XXX.eu-west-1.elb.amazonaws.com.

After a couple of hours the DNS propagation is done and the subdomains can be used. We can then have access to the interfaces through those subdomains.

Thanks to the routing mesh, if there is no task of the vote service running on the node targeted by the load balancer, traffic will automatically be forwarded to an existing task.

Setup a TLS termination with Traefik

Services are now available specifying the port number in the url. We will change this behavior and only let traffic get to our Swarm through port 80/443.

Træfik (pronounced like traffic) is a modern HTTP reverse proxy and load balancer made to deploy microservices with ease.

In this part we will see how to use Traefik as a reverse proxy and a TLS termination in front of our Swarm. It will be in charge of

  • setting up the TLS certificates through Let’s Encrypt
  • forwarding the traffic to the correct backend service based on rules on the Host header of the incoming requests

Traefik configuration

We start by creating a configuration file for Traefik. This one follows the .toml format, it contains the following information:

  • debug level (just in case)
  • address of the Traefik web interface
  • definition of 2 entrypoints, http and https. Traffic on http is automatically redirected to https
  • declaration of the backend used: swarm mode here
  • Let’s Encrypt configuration: Traefik will automatically get certificates for the domains listed
# traefik.toml
# Global configuration
debug = true
logLevel = "DEBUG"
# Web configuration backend
[web]
address = ":8080"
# Entrypoints
defaultEntryPoints = ["http", "https"]
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
# Docker configuration backend
[docker]
endpoint = "unix:///var/run/docker.sock"
domain = "lucarea.com"
watch = true
swarmmode = true
exposedbydefault = false
# Let’s Encrypt configuration
[acme]
email = "xxx"
storageFile = "acme.json"
entryPoint = "https"
[[acme.domains]]
main = "lucarea.com"
sans = ["vote.lucarea.com","result.lucarea.com","viz.lucarea.com"]

We then create a Dockerfile which uses the official Traefik image and add the configuration file above.

# Dockerfile
FROM traefik:1.3.5
COPY traefik.toml /etc/traefik/traefik.toml

We can then build the image and push it to Docker Hub (it will end up as a public image there).

$ docker image build -t lucj/voting-proxy:1.0 .
Sending build context to Docker daemon 3.584kB
Step 1/2 : FROM traefik:1.3.5
1.3.5: Pulling from library/traefik
df350fade9bb: Pull complete
16fac1cc7b67: Pull complete
Digest: sha256:f502c5f275f1518dcc27e63d7733b9c654a8f8ecfda2c41bacca687f20131eaa
Status: Downloaded newer image for traefik:1.3.5
---> 088bcecd63de
Step 2/2 : COPY traefik.toml /etc/traefik/traefik.toml
---> e8853944ed06
Removing intermediate container 15c8ed650f68
Successfully built e8853944ed06
Successfully tagged lucj/voting-proxy:1.0
$ docker image push lucj/voting-proxy:1.0
The push refers to a repository [docker.io/lucj/voting-proxy]
c40d8a836fbc: Pushed
15ab91aee858: Mounted from library/traefik
3b10c345cfdd: Mounted from library/traefik
1.0: digest: sha256:c67b887681c24ba3...768736cb28bb size: 946

Traefik service

We can now define a service dedicated to Traefik. It exposes ports 80, 443 and 8081 (web interface). The Docker socket is mounted in order to watch the Swarm’s events. This service uses the network named frontend created above.

# traefik.yaml
version: "3.3"
services:
traefik:
image: lucj/voting-proxy:1.0
networks:
— frontend
ports:
— "80:80"
— "443:443"
- "8081:8080"
volumes:
— /var/run/docker.sock:/var/run/docker.sock
deploy:
replicas: 1
labels:
— "traefik.enable=true"
placement:
constraints: [node.role == manager]
restart_policy:
condition: on-failure
networks:
frontend:
external: true

Before running the Traefik proxy, we will make a couple of changes in the definition of the application within the docker-stack.yml file.

Adding Traefik’s labels to the services

In order to configure Traefik and to define routing rules, we can use the .toml configuration file or we can directly set labels on the services. This second approach is much more dynamic, this is the one we will illustrate with the service named vote.

We modify the definition of the service so it uses labels.

vote:
image: dockersamples/examplevotingapp_vote:before
networks:
- frontend
depends_on:
- redis
deploy:
replicas: 2
labels:
- "traefik.enable=true"
- "traefik.frontend.rule=Host:vote.lucarea.com"
- "traefik.frontend.entryPoints=http,https"
- "traefik.backend=vote"
- "traefik.port=5000"

update_config:
parallelism: 2
restart_policy:
condition: on-failure

This basically tells Traefik to “forward” all requests targeting vote.lucarea.com to the service vote on port 5000. As the same time we also removed the publication of the service port as we only want the service to be accessible from the outside through Traefik.

Putting everything together

In order to take into account all the changes we have done so far, we will stop the running stack (voting) and start the one containing the changes we’ve done. Then, we run the service dedicated to Traefik.

$ docker stack deploy -c docker-stack.yaml voting
$ docker stack deploy -c traefik.yaml tls

At this point, all the requests targeting http://vote.lucarea.com are redirected to https://vote.lucarea.com using the certificates generated through Let’s Encrypt. Traefik then redirects the traffic to the dedicated backends (vote service).

Note: there might be a problem sometimes with Traefik configuration with services using several backends. In this case, a 502 / Bad Gateway status code is returned. I still need to figure this out.

Interesting point here: Traefik provides a web interface which lists the frontend rules (for instance the value of the Host header of the incoming request) and the related backend the traffic is redirected to.

We have only shown here the vote service but the same changes could be done with result or visualizer ones.

Delete the Swarm

In order to delete all the things we have setup so far, we just need to go back to the Docker Cloud interface and Terminate the Swarm.

Summary

In a very short time we have created a Swarm on AWS using Docker Cloud, deployed a micro-services application, configure a domain name and added Traefik to ensure the TLS terminaison. In a next article I’ll go deeper in the configuration of Traefik.