Zero Downtime Deployment with Docker Swarm

Mar 23, 2020 · 3 min read

If you are a software developer that in the past has dealt with production software, you are certainly familiar with this struggle:

deployment time

I hope you have already put into effect measures to *trust your deployment* by taking advantage of great practices such as Continuous Integration, Automated Testing, Continuous Delivery and so on, these should be carried out before tackling the thing I’m going to talk about. This is not going to be a silver bullet.

Today we’ll talk about zero downtime.

the problem

Imagine you have already packaged your application into a docker image, published it into a docker registry and have it up and running in production by the following command:

docker run …

Nothing wrong, your application is good to go.

But… how can we update it?

The easiest solution is to stop the old one and start the new version

docker stop <running container id>docker run <new version>

There’s a problem with this approach: from the time you stop the old container and the complete bootstrap of the new version, your application will not respond.

the solution

This problem is pretty common and can be solved in many ways. We discovered an almost “effort free” way using Docker Swarm.

what is Docker Swarm?

Swarm is a Docker “mode” already included in your Docker installation. It’s a powerful cluster engine that will help you scale your application.

Also, it will solve your downtime problems.


The idea is to transform your docker instance into a single node swarm cluster:

docker swarm init

This command should return something like

Swarm initialized: current node (<node_id>) is now a manager.To add a worker to this swarm, run the following command:docker swarm join — token <swarm_token> <node_addess+port>To add a manager to this swarm, run ‘docker swarm join-token manager’ and follow the instructions.

These are instructions for adding nodes to our cluster. But you don’t need that now.

Now you need to deploy the stack.

In order to achieve that, you need to define a *docker-compose* file like this:

version: ‘3.7’networks:
external: false
image: ${IMAGE}
hostname: my-server
container_name: my-server
- ‘8080:8080’
- my-network
test: [“CMD”, “curl”, “-i”, "http://localhost:8080/health"]
mode: replicated
replicas: 2
order: start-first
failure_action: rollback
delay: 5s

In this example there is just one container, but you can deploy as many as you need.

Some concepts:

— image: a variable that represents the image name and version

network: the stack network, with this various services can communicate with each other

healthcheck: definitions of commands that verify the service status (up or down). The commands should return `error code = 0` when the service status is good and `error code != 0` when the service has not started yet, stopped, paused, etc..

replicas: the numbers of parallel containers of the service that will be deployed

You can start your service by the following command:

export IMAGE=${IMAGE_NAME}:${IMAGE_VERSION}docker stack deploy -c docker-compose.yml <stack_name> — with-registry-auth

Once you are ready to update your service, you can use the same command, with a different value of IMAGE_VERSION, and there will be no downtime since Swarm will take care of starting the new containers and, once started correctly, it will also take care of stopping the old ones.