Rollout and Rollback In Docker Swarm

TL;DR

Thanks to a couple of options, Docker Swarm makes it really easy to perform a rolling update on a running service.

Creating a Swarm

The following command is the only thing needed to move the Docker daemon into Swarm mode.

$ docker swarm init

Note: if you have several private IPs you need to use one of them and feed the --advertise-addr with it. The above command would then be :

$ docker swarm init --advertise-addr MY_IP

Once the Swarm is created we could add additional nodes but this is not useful in this article, a one node Swarm will be totally fine.

Disclaimer: do not run a one node Swarm in production through :)

Deploying a service

On a Swarm, a service can be either deployed manually or through a stack file. Let’s use the first option and create a service based on the instavote/vote image.

The command run above is the following one.

$ docker service create \ 
--name vote \
--replicas 4 \
--publish 5000:80 \
instavote/vote

It specifies 4 replicas for the service. Behind the hood, this means 4 tasks (a task runs a container) of the vote service are now running on the swarm. When sending a request to the vote service, on port 5000, this one is load-balanced (by default in a round robin way) towards one of the 4 tasks.

The screenshots below illustrates 2 successive calls of the vote service, each call is handled by a different container.

Note: this service just provides the frontend of the Voting App, and does not allow to vote, but it’s just fine to illustrate the rolling update as we will see below.

Rolling update

Let’s say we now need to update the service and change the original image with the instavote/vote:indent one. Updating a service is as simple as :

$ docker service update \
--image instavote/vote:indent \
vote

Let’s see it live :

From the above video, we can notice that in order to update the service, the tasks are updated sequentially. Each one goes through the following states :

  • preparing
  • ready
  • starting
  • running

Once a task is updated, then it’s the next one’s turn and so on. This is the default behavior. When sending a request to the service, we can now vote between Tabs and Spaces.

Using the following options in the service configuration we can customize the way the update is done :

  • --update-parallelism: number of tasks to update at the same time
  • --update-delay: time to wait before updating the next batch of tasks

Note: the documentation provides the list of all the options that can be used for the service creation / update

Let’s update the service once again specifying that we want the tasks to be taken 2 by 2 and that each batch should be updated 10 seconds after the previous one is done. The following command allows to do so :

$ docker service update \
--update-parallelism 2 \
--update-delay 10s \
vote

We can see the tasks are not stopped during the process, they remain in running state, as only the service parameters are updated. No need for a restart here.

If we now update the service with a new image, the rolling update will act differently. Let’s update the service with the instavote/vote:movies :

$ docker service update \
--image instavote/vote:movies \
vote

As expected, 2 tasks are updated, 10 seconds later the 2 others are done. We now have the choice between Scifi movies.

Service inspection

If we inspect the service, we can see the Spec and PreviousSpec keys. While the Spec key refers to the current specification of the service, the one with the movies image tag, the PreviousSpec refers to the image with the indent tag. Swarm keeps track of 2 levels in the service’s history.

$ docker service inspect vote
[
{
"ID": "osztlllvdwswkzyrjojcykakf",
"Version": {
"Index": 67004
},
"Spec": {
"Name": "vote",
"TaskTemplate": {
"ContainerSpec": {
"Image": "instavote/vote:movies@sha256:b50..d8c",

},

},
"PreviousSpec": {
"Name": "vote",
"TaskTemplate": {
"ContainerSpec": {
"Image": "instavote/vote:indent@sha256:265...c35",

},

}
}
]

Rollback

Because Swarm knows the previous specification of the vote service, it’s possible to rollback to this one with the following command :

$ docker service rollback vote

As we can see, the tasks are rollback one after another. As we did not provide additional configuration specifying the way the rollback should be done, it uses the default behavior (one task is updated after another and no delay between each update). As for the configuration of the rolling update, we could have used the following options in the service definition to specify the way we want the rollback to be done :

  • --rollback-parallelism
  • --rollback-delay

Note: the documentation provides the list of all the options that can be used for the service creation / update

After the rollback, the image instavote/vote:indent is the one used once again by the service.

About automatic rollback

In the example above, we have performed a manual rollback of the service. But… what if we want the rollback to be automated in case the new version is crapy ? Let’s consider another example.

lucj/whoami:1.0 and lucj/whoami:2.0 are 2 versions of a simple API that only returns the hostname of the container that handles a HTTP GET request received on the /whoami endpoint. Image with tag 1.0 runs fine, the one with tag 2.0 is buggy. The Dockerfile used to build those images is the following one :

FROM mhart/alpine-node:6.11.4

RUN apk update && apk add curl

HEALTHCHECK --interval=5s --timeout=3s --retries=3 \
CMD curl -f http://localhost:8000/whoami || exit 1


COPY package.json /app/package.json
WORKDIR /app
RUN npm install
COPY . /app

CMD ["npm", "start"]

An important thing to note here is the presence of the HEALTHCHECK instruction. It is used to verify the well-being of the service. It basically regularly checks, from within the container, if the /whoami endpoint is responding. Swarm uses the result of this healthcheck to perform additional actions we can configure at the service level.

Let’s create the service using the image with tag 1.0.

$ docker service create --name whoami \
--replicas 2 \
--update-failure-action rollback \
--update-delay 10s \
—-update-monitor 10s \
—-publish 8000:8000 \
lucj/whoami:1.0

We specify the --update-failure-action flag and give it the rollback value so that an unsuccessful update will trigger a rollback automatically. Also, the --update-delay and --update-monitor are used to provide additional timing for the healthcheck to be done.

Let’s update the service with the buggy lucj/whoami:2.0 image.

Because the first task cannot be updated (healthcheck failed) a rollback is automatically done. In the process, the second task was not impacted ensuring the service is still available. If we had specified only one replica, the service would be down during the update process, and obviously this is something we want to avoid.

Summary

I hope this article was useful to illustrate the rolling update capability of Docker Swarm. It’s among the best practices to specify the update configuration options when defining a service in order to minimize the downtime.