How To Setup Mongo Replica Set on Kubernetes (The Quick and Easy Way)
Disclaimer: This method is intended for you to quickly setup a dev environment for testing, and not meant for production environment.
What is a Mongo Replica Set
A basic Mongo Replica Set involves three Mongo instances:
One primary instance, and two secondary instances. Where the primary instance is in charge of reading and writing data as requested by the client application, and secondary instances replicate all write operations on the primary instance to keep a full replication of the dataset.
Each instance has its own set of files, this provides data redundancy.
If the primary instance fails, one of the secondary instances will become the primary instance and take over the role, which provides service redundancy.
How Mongo Replica Set is initialized:
- Startup of all Mongo instances with Replica Set name specified
- Replica Set config is passed into one of the instances via Mongo Shell
- Mongo instances share the Replica Set config and find all pear instances
- Voting of Primary instance
- Mongo Replica Set Initialized
Starting Mongo Instances
mongod --replSet rs0 --port 27017 --bind_ip localhost,$POD_IP_ADDRESS --dbpath /data/db/rs0-0 --oplogSize 128
When starting Mongo in K8s, it is important to bind Mongod to the Pod’s IP Address. Otherwise, you may have trouble connecting to it. You also need to specify the Replica Set name, it has to match the Replica Set configuration you pass in later on:
Sending In Replica Set Configuration
The higher the priority, the more likely the instance will be voted as the primary instance. The following configuration is sent via MongoShell:
rs.initiate({
"_id" : "rs0",
"members" : [
{
"_id" : 0,
"host" : "10.0.40.38:27017",
"priority" : 1
},
{
"_id" : 1,
"host" : "10.0.217.218:27017",
"priority" : 0.9
},
{
"_id" : 2,
"host" : "10.0.176.196:27017",
"priority" : 0.5
}
]
})
Now that we know how to set up Mongo Replica Set works, let’s talk about how we’d implement this on a K8s cluster.
Implementations (AKS/OnPrem/Minikube)
AKS = Azure Kubernetes Service
Main Challenges
The main challenges of implementing Replica Set (as illustrated above) is Storage and Service Discovery, let’s have a look at how we intend to solve these two problems:
Storage (Persistent Volume)
Pods storage is not persistent by default, anything you save on a Pod will be wiped out when it restarts.
For AKS, we can use Azure Managed Disks for PV. You could think of it as Google Drive for your Pods.
For OnPrem and Minikube, it’s easiest to use HostPath PV. This is the same as Docker Volume Binding, which binds a path between the K8s Node and your Pods.
Service Discovery
Unlike when you start a Mongo instance locally, when you start a Pod on K8s IP Addresses are assigned dynamically in K8s. Every time a Pod restarts, it gets assigned yet another IP Address.
We need a way to let Mongo instances find each other’s IP Addresses and port number, and it cannot be hardcoded.
We’ll use K8s Service to wrap around each Pod to provide it with a stable IP Address and DNS name in the cluster.
P.S: We can either use the in-built Domain Name Service (DNS) in your K8s cluster, or we could pass Environment Variables around. I prefer using Environment Variables because I could also pass Network Port Numbers around, whereas DNS only handles the IP Address without extensive configurations.
Azure Kubernetes Service (AKS)
For this article, we will focus on AKS. I’ll write more on how to implement this on an OnPrem cluster with HostPath as PV another time.
First, let’s go through the AKS implementation. This is, in my opinion, the most suitable for K8s virgins, since all underlying cluster management is carried out for the user.
You have a full year of free trial upon registration, you do however have to register on Azure with a credit card, but you have full control over when the payment will be activated. It’s unlikely that you have any accidental cost.
Setting Up Your First AKS Cluster
- Register Azure account for a free trial
- Install Azure CLI tool on your OS
- Launch Your AKS Cluster
I’ll write a guide on these three steps, later on. If you need help with this check back later! :)
Now, let’s head on to K8s manifests for the AKS implementation of Mongo Replica Set…
Kubernetes Manifests
For all K8s manifests, you can find them in this GitHub Repo. If you intend to run this in your environment, you’re highly recommended to fork the repo instead of copying the code off of Medium. Otherwise, you may have to unnecessary time fixing indentations (Yes, K8s manifest indentations are FRAGILE).
Kubernetes Objects
- 3 x Persistent Volume (PV)
- 3 x Persistent Volume Claim (PVC)
- 3 x Service (Cluster IP)
- 3 x Pod
Each Mongo instance consists of a PV, PVC, Service, and a Pod.
All components are identical, the only difference is the naming of the components. They are named as mongodb-0, mongodb-1, and mongodb-2.
We will go through the configuration of mongodb-0 below:
Persistent Volume Claim Definition
Here we’ve created a PVC named azure-managed-disk-mongodb-0, requesting Azure to provision a managed disk (which is the PV in this case) with 5 GB of storage.
Service (Cluster IP) Definition
We’ve created a K8s Cluster IP Service, which will forward all network calls on port 27017 to Pods with the label ‘service: mongodb-0’.
Pod Definition
Notice a few key things in this definition:
- The Pod’s label in line 7 matches the label defined in Service
- An Environment Variable named MONGODB_ID is passed into the Pod, this matches the name of the Pod
- The pod will mount the PVC onto “/data/db” which keeps the data intact as long as PVC is not deleted
The image used for this Pod will run the appropriate startup script for mongodb-0/1/2 by reading the Environment Variable MONGODB_ID.
Now repeat the same definition for mongodb-1, and mongodb-2.
Make sure you change all occurrences of ‘mongodb-0’ to ‘mongodb-1’ or ‘mongodb-2’.
mongo.yaml
Note that you can have all definitions stored in a single YAML file, this makes your life easier when starting and stopping all K8s objects.
Startup Mongo Replica Set
This is the moment of truth! Before we start let’s open two terminals side by side:
On your first terminal, type this to monitor resources on your K8s cluster:
watch kubectl get all
On your second terminal, type this to apply the K8s manifest we just worked on:
kubectl apply -f mongo.yaml
Now watch it launch:
Verify Mongo Replica Set IS Initiated
We can do that by going into pod/mongodb-0:
kubectl exec -it pod/mongodb-0 /bin/bash
And then starting Mongo Shell:
mongo
On the last line, you see the Replica Set name, and the role of this Mongo instance:
Next Step?
Congratulations! You now have a working Mongo Replica Set! You can now connect your clients from the same cluster and namespace with the following MongoURI:
mongodb://mongodb-0-service:27017,mongodb-1-service:27017,mongodb-2-service:27017/admin?replicaSet=rs0
The reason all Mongo instances are included is to let the client(s) know which Mongo instance it should use when one fails.
Summary
To sum up what we’ve done:
- Launched 3 Mongo instances on AKS with K8s objects (PV, PVC, Service, Pod)
- Connected 3 Mongo instances together as Replica Set
You are now able to connect your applications to use this Replica Set at your will, and watch the Mongo instances fail-over to one another as you manually shut any of them down.