Deploy Docker containers with Zero Downtime

limone.eth
6 min readOct 19, 2020

--

!!Disclaimer

If you are reading this article, I assume that you are quite familiar with Docker and its main functionalities/commands. Otherwise, I suggest you to read some introductory articles on the topic.

Introduction

In the last years, web services are spreading all over, together with the online users’ traffic. High levels of scalability, availability and flexibility are becoming a de facto standard. In this context, Docker wants to help developers in this journey, facilitating infrastructures optimizations needed to fulfill certain requirements and easing the deployment process.

Using Docker containers and images, developers can create lightweight standalone applications that can be easily replicated on different nodes completely avoiding environment compatibilities issues. In this way, horizontal scalability can be achieved without too many efforts. Indeed, we can create multiple instances of our application, so that the incoming traffic is distributed across all the nodes. Of course, we must put in place a Load Balancer, a network component who’s in charge of distributing incoming traffic to different targets. Once that’s done our infrastructure will be able to support much more workload.

Furthermore, using a Load Balancer improves the availability levels of our application. Whenever a node goes down and is not able to receive more requests, those requests will be processed by its other fellow nodes who are still active. Moreover, in terms of deployment, is important to achieve zero downtime (to keep high availability levels). Using one container alone will force us to stop and restart it whenever an update is required, and this brings a variable downtime. Instead, having a Load Balancer allows us to re-deploy one container at a time so that the service remains always available.

By default, Docker does not come with a Load Balancer, that’s why we need to do some extra work to implement it. As we will see, Docker Swarm comes in our help.

Docker Swarm

Docker Swarm, as an orchestrator, extends standard containers’ abilities providing many tools for scaling, networking, securing and maintaining your containerized applications.

Swarm does not create individual containers. All Swarm workloads are scheduled as services, which are scalable groups (stacks) of identical containers with additional networking features maintained directly by Swarm. Using Docker services instead of starting and stopping Docker containers manually, we can perform zero-downtime deployments.

Zero Downtime Deployments — In Practice

Let’s start with our practical example.

Initializing the Swarm and Setting up the project

First of all, we should start a Docker Swarm on our local machine (for the sake of the tutorial I’ll do everything locally, without considering a cluster with remote nodes).

$ docker swarm init

Then we create a simple Node.js application, with just one file:

We write a simple Dockerfile to build the associated image.

We can now build our image:

$ docker build -t docker-zero-downtime-deployment .

To ensure that all the containers that we are going to spawn use the same image, we should push it to a docker registry. In this article, we will use a local registry.
We can run a local docker registry on port 5000 by typing:

$ docker run -d -p 5000:5000 --name registry registry:2

Then we need to tag and push our image to the registry.

$ docker tag docker-zero-downtime-deployment localhost:5000/docker-zero-downtime-deployment$ docker push localhost:5000/docker-zero-downtime-deployment

At this point, our docker image is available on http://localhost:5000, and our docker-compose will be able to pull it.

Writing the docker-compose file

Indeed, we can write our docker-compose file (don’t worry I will go through each line of it):

First of all, there are the common commands to specify: the image (which is pulled from our registry), ports and the starting command. Then, under the deploy command, we see:

  • replicas: specifies the number of containers that should be running at any given time
  • update_config: configures how the service should be updated (this is useful for configuring rolling updates)
  • rollback_config: configures how the service should be rollbacked in case of a failing update.
  • restart_policy: configures if and how to restart containers when they exit.

For what concerns update_config and rollback_config, we have several parameters available (in common between the two configurations) to customize those processes.

  • parallelism: the number of containers to update at a time.
  • order: order of operations during updates. One of stop-first (the old task is stopped before starting a new one), or start-first (the new task is started first, and the running tasks briefly overlap).
  • failure_action: what to do if an update fails. One of continue, rollback, or pause (default: pause).
  • max_failure_rate: failure rate to tolerate during an update.
  • delay: the time to wait between updating a group of containers.
  • monitor: duration after each task update to monitor for failure (ns|us|ms|s|m|h) (default 5s).

In our case, in the update_config we are allowing two containers to be updated at a time, starting first the new tasks and then removing the old ones. Using rollback as failure-action, we are saying that we want to use the rollback_config in case of a failure. Then we have a delay of 10 seconds between one container update and the other.

Regarding restart_policy, we have four parameters available:

  • condition: one of none, on-failure or any.
  • delay: how long to wait between restart attempts.
  • max_attempts: how many times to attempt to restart a container before giving up.
  • window: how long to wait before deciding if a restart has succeeded.

After that, we see the healthcheck parameter. This is important to provide the docker service with a way to check if the replicas are healthy (or not). Indeed, during the containers update, Swarm will validate the update only if the new container is healthy. If it is not healthy, the update does not proceed with the other replicas, and the starting state is restored (the old containers remains alive). A simple healthcheck consists of a test command to be performed to do the check. We can further customize it specifying an interval (in which the health check is periodically performed), a timeout (e.g., after 10 seconds, if the replica is still not responding we consider it unhealthy), the number of retries and the start_period (e.g., start checking the replica 40s after it has been created).

Deploy the stack

Now, we are set up with all the configuration files that we need to deploy our docker service and its replicas with zero downtime. We just have to try it!

$ docker stack deploy -c docker-compose.yml docker-zero

If everything is working, now you should see 4 new containers running exposed on port 3000(use docker ps to check it), and one service with 4 replicas associated (use docker service lsto check it). This last command will indicate also the number of healthy replicas.

‘docker ps’ results after the first deploy

Now, each time an HTTP call is received on port 3000, the service will forward that request to one of the containers, acting as a Load Balancer.

Then, whenever we update our images with new code/features, we need to update our containers. To do that, we can type again the stack deploy command, and that’s what should happen:

‘docker ps’ results after the second deploy

As you can see, here we have 4 healthy containers and 2 more containers that are starting. As soon as the new containers become healthy, 2 of the old ones will be replaced, and then 2 new containers will spawn. Again, once they become healthy, the 2 remaining old ones will be replaced. At this point, we will have 4 new containers running.

Conclusion

We have seen a simple method to deploy docker containers with zero downtime using Docker Swarm. Tweaking up our docker-compose, we can customize and scale the deployment process according to our needs.

Github Repository

https://github.com/simonestaffa/docker-zero-downtime-deployments

--

--

limone.eth

Backend Engineer @Backdrop | Co-Founder @urbe.eth | Organizing ETHRome