MongoDB replica set on swarm mode

Luc Juggery
@lucjuggery
Published in
6 min readSep 26, 2016

In this article we will see how we can deploy a 3 nodes MongoDB replica set (1 primary and 2 secondaries) in a Docker swarm cluster created with swarm mode.

Swarm mode: quick introduction

Basically, swarm mode is one of the great features of Docker 1.12, this is an optional mode of the Docker engine that enables to create a secure cluster very easily.

If you are not that familiar with swarm mode yet, Docker documentation will help you to get started in no time.

MongoDB replica set: quick introduction

A replica set in MongoDB is a group of mongod processes that maintain the same data set. Replica sets provide redundancy and high availability, and are the basis for all production deployments.

If you want to have details of this architecture, the official web site is full of details, schemas, examples, … Definitely worth a read.

One of the simplest setup of a replica set is the following one (made up of 1 primary and 2 secondaries).

Why a MongoDB replica set on swarm mode ?

Introduced in Docker 1.12, swarm mode is a really hot topic since a couple of months as it makes docker cluster creation a breeze. Additionally when a service is deployed on a swarm it gets some great features out of the box, among them:

  • replicated across nodes
  • dns load balancer
  • automatically rescheduled in case of failure
  • declarative service model

The purpose of this article is to test how MongoDB behaves when running a production like environment (replica set) using Docker Swarm.

Creation of our swarm cluster

In order to quickly setup a swarm cluster, please check the following article that will help to create a 2 managers + 2 workers swarm.

Now that we have the nodes setup, what is the strategy we can use to deploy our MongoDB replica set ?

Setting up our mongo services

One possible way to setup a replica set on a swarm cluster is to define one service per replica’s member (one service for the primary, and on service for each of the secondaries) and another service that will be in charge to configure this replica set.

In a compose file, that could look like:

version: ‘2’
services:
rs1:
image: mongo:3.2
command: mongod — replSet “rs0”
rs2:
image: mongo:3.2
command: mongod — replSet “rs0”
rs3:
image: mongo:3.2
command: mongod — replSet “rs0”
rs:
image: lucj/mongors

Each service among rs1, rs2, rs3 is based on mongo:3.2 image and ran with mongod -replSet “rs0” command to specify the mongod instance is part of the replica set named rs0.

The fourth service, named rs and based on lucj/mongors image. It is built with the following Dockerfile.

FROM mongo:3.2
COPY init.sh /tmp/init.sh
CMD /tmp/init.sh

The image is based on the official mongo:3.2 image available from Docker Hub, and uses a simple init.sh shell script that will be triggered at startup.

# Make sure 3 replicas availablefor rs in rs1 rs2 rs3;do
mongo --host $rs --eval 'db'
if [ $? -ne 0 ]; then
exit 1
fi
done

# Connect to rs1 and configure replica set if not done
status=$(mongo --host rs1 --quiet --eval 'rs.status().members.length')if [ $? -ne 0 ]; then
# Replicaset not yet configured
mongo --host rs1 --eval 'rs.initiate({ _id: "rs0", version: 1, members: [ { _id: 0, host : "rs1" }, { _id: 1, host : "rs2" }, { _id: 2, host : "rs3" } ] })';
fi

Basically, this script will make sure the 3 mongod instances are up and running and will then configure the replica set if not already configured.

Let’s test this against our newly created swarm… Wait a minute ! This is not possible as compose file cannot be ran on a swarm running in swarm mode, DAB (distributed application bundle) must be used instead.

No problem, let’s create our DAB from this compose file. It’s as simple as:

$ docker-compose bundle
Wrote bundle to mongors.dab

Our mongors.dab file is the following one

{
"Services": {
"rs": {
"Image": "lucj/mongors@sha256:1e31dad5a4ea5e9ecc0681f775d0caa6a53e0b3bc04ad76aa82660aed3d39f66",
"Networks": [
"default"
]
},
"rs1": {
"Args": [
"mongod",
"--replSet",
"rs0"
],
"Image": "mongo@sha256:8ff7bd4acdb123e3922a7fae7f73efa35fba35af33fad0de946ea31370a23cc4",
"Networks": [
"default"
]
},
"rs2": {
"Args": [
"mongod",
"--replSet",
"rs0"
],
"Image": "mongo@sha256:8ff7bd4acdb123e3922a7fae7f73efa35fba35af33fad0de946ea31370a23cc4",
"Networks": [
"default"
]
},
"rs3": {
"Args": [
"mongod",
"--replSet",
"rs0"
],
"Image": "mongo@sha256:8ff7bd4acdb123e3922a7fae7f73efa35fba35af33fad0de946ea31370a23cc4",
"Networks": [
"default"
]
}
},
"Version": "0.1"
}

In the current version of DAB, this is not possible to add parameters such as --restart-max-attempts to specify the number of time a service will restart when it fails.

In our case, the rs service will fail until the rs1, rs2 and rs3 services are available (meaning that a mongo client client can connect to each of the mongod instances) and will then setup the replica when this condition is meet. It will not be of any help after the setup is done.

In this configuration, it will run indefinitely when we only need it to setup the initial replica set.

Instead of DAB, we will then use a shell script using the service api, this one manually defines the services that will run and specifies a maximum number of restarts of the service dedicated to the configuration.

Note: hopefully, in a next release more parameters will be available so DAB can be controlled with a finer granularity and can be used over shell scripts.

Note: we’ve indicated the rs service should be restarted 10 times maximum if it fails. I do not really like this approach as this is not very accurate and error prone, but it will be just fine for an example. Also, it would be great if that could be dynamically configured.

Running this script on one of our swarm manager will create the replica set as we expect.

Is our MongoDB replica set configured correctly ? We first need to get the host on which our MongoDB primary is running and then issue the rs.status() command.

This output shows that among the 3 members, 1 has Primary status and the 2 others have Secondary status. Every thing seems fine.

If one of the nodes of the replica set comes to fail, the service’s task (=container) will be rescheduled on another node and it will still be accessible through the VIP allocated to the service when it was created. In other words, a client targeting the replica set will not be impacted by a failover as it will use a MongoDB url like the following, not dependent from an actual container’s IP.

mongodb://VIP_rs1,VIP_rs2,VIP_rs3/?replicaSet=rs

Note: the important thing to note is that the VIP assigned to each services (rs1, rs2, rs3) do not change during the lifetime of the service.

We can now very easily create data on the Primary and check it’s replicated correctly to the secondaries. We coud also kill a node on which a mongod instance is running and observe the failover and play a little bit with it but this might be the subject of a later article.

Summary

We’ve seen one usage of docker’s service api to create a MongoDB replica set. Of course, this configuration is just for testing purposes. Also, additional options could be provided (ie: contraints to make sure 2 mongod instances do not run on the same node, …) to ensure a more robust solution. Hopefully, in a near future DAB will offer much more options that will ease this kind of setup.

I would love to have some feedbacks on this topic.

--

--

Luc Juggery
@lucjuggery

Docker & Kubernetes trainer (CKA / CKAD), 中文学生, Learning&Sharing