Some notes on running MongoDB replica sets on Kubernetes, persistent data with EBS, and cluster node management

I wrote a blog post earlier this week about using services in Kubernetes, if you come from a Docker-specific background and maybe found the Kubernetes tools easy to follow along with, but perhaps not readily analogous to the experience of developing with Docker locally and translating that to a production environment:

I got some feedback that spoke to how this wasn’t clear how this could be beneficial (without some exposition), so I decided to use a pretty common, but pretty straightforward example of excellent, distributed software that is a very good candidate for containerization.

I won’t go into the benefits and drawbacks and containerizing software like MongoDB, and in this case, will assume your (hypothetical) needs are something like:

  1. Needing to scale MongoDB
  2. Persistent data as you scale up and down, and in failure scenarios (where Kubernetes container policy options will definitely bring you value)
  3. This needs to happen somewhat opaquely (an endpoint exposed to your application, for example, for this somewhat dynamic MongoDB service)

All you’ll need is:

  1. The awscli package installed on your machine (if you plan to use EBS to make the data persist)
  2. A running Kubernetes cluster (you’ll need the awscli to deploy the cluster if you do not have one already, or you can use a service like or a distribution like Tectonic with an installer)

Creating the MongoDB Replica Set

There’s two components here, your Kubernetes pods, and the volumes as exposed to Kubernetes.

If the data need not be persistent, you can just create your pod (ideally, this would live in a deployment or replication controller to make the pod, itself, persistent in the event of node issues) without a volume, however, the data will not survive a crashed container, and the cluster may need to be (basically) continuously rebuilt:

apiVersion: v1
kind: Pod
name: mongodb
app: mongodb
- name: mongo-node-1
image: mongo
imagePullPolicy: Always
command: mongod -replSet mongo-k8s
- containerPort: 27017

and then additional pods like the above to manage the replicas. To make the data persist, using the awscli , you can define an EBS device for use with each of the pods:

aws ec2 create-volume --availability-zone $AZ --size $VOL_SIZE --volume-type gp2

and make note of the resulting volume IDs.

To amend the pod to make use of the volumes, amend the configuration like this:

- mountPath: /mongo-data
name: mongo-data-1

inside the containers: block, and nested within spec add the EBS information:

- name: mongo-data-1
volumeID: <volume-id>
fsType: ext4

Creating the pod(s) from this configuration should bootstrap the cluster.

Exposing MongoDB to Applications

If your application also lives on your Kubernetes cluster, the simplest means of linking the applications is through a service, and the configuration will look something like:

kind: Service
apiVersion: v1
name: mongodb
namespace: default
app: mongodb
- port: 27017
targetPort: 27017

This will allow mongodb-servce to resolve to the pod addresses for the MongoDB service. However, if you wanted to get more granular with how the cluster is interacted with, since you are using labels here as the selector for connecting a service name to specific pods, you can do the same to segment requests.

Adding a label to the replicas (pod 2 and 3, for example) to direct only read-traffic is simple, and adding something like this to the configuration for those pods:

app: mongodb
io: read-replica

and then using the read-replica label as the selector in the above service and using that service name for read operations will usually suffice, and allow you to safely allow read access from services outside the cluster if you’d wanted (and expose it through ELB, for example, with the LoadBalancer type service option. You can likewise do the same for the current primary Mongo node for targeting write operations.

Extending the Cluster, Making the most of Kubernetes

Adding new nodes to the replica set only requires adding additional replica pods, and since the mongodb-service definition selects target pods using the built-in labeling system, it should pick up new backend pods as they are provisioned.

This is great, and provides some redundancy to a fairly robust system, so the obvious remaining point of failure is your EC2 node fleet itself making up the cluster. Since pods can, theoretically, be provisioned on the same node in a cluster, losing a node for whatever reason (and while it’s rebuilt depending upon your autoscaling group policies, etc. in AWS) can result in an outage, so you can further target your deployment by:

  1. Labeling your nodes
  2. Using the nodeSelector key in your configuration to manually target deploy pods to specific nodes to keep them relatively isolated from too much of the rest of the MongoDB cluster, in this case, to remain online if a worker node drops out of the Kubernetes cluster.

Another (more automated) approach to this is to effectively reserve the node resources, so if you want to avoid deploying to a specific node, you can also begin using the tainting feature in kubectlon said node to prevent Kubernetes from (re)scheduling onto that cluster member unless it matches the defined behavior in your taint command (so, for example, reserving it for specific namespaces).

Kubernetes, like Docker Swarm, has multiple affinity options, meaning how pod containers are scheduled adhere to algorithmically defined behavior (for example, like the binpack method in Swarm, but with a little more flexibility from your out of the box options):

Further Reading

An excellent read, and a more in-depth exploration of how MongoDB can be containerized and run atop Kubernetes (albeit, not specific to AWS) could be found here:

Another excellent post on using tainting in Kubernetes was this one, which shows some really great, more involved usages: