MongoDB ReplicaSet on Kubernetes with Auto-failover & Auto-scaling

Published in

codelogicx

15 min readApr 29, 2023

The Current Connundrum:

For versions of MongoDB ≤ 4.0, there is an existing support for a kubernetes side-car to automatically deploy replicaset configuration on kubernetes, but I was not able to successfully deploy mongodb replica-set for version 6.0 with the existing side-car.

Therefore, to deploy a MongoDB with the capacity to auto-assign replicas on up-scaling and to auto-removal on down-scaling, I used a combination of ‘rs’ commands, configmap and pod lifecycle hooks.

Before we get into the how’s, let us understand WHY !?

Standalone MongoDB (single pod deployment) is easy to deploy and can be a good solution for dev stages but such deployments on production environment is very unstable.

Following are the dis-advantages of single pod deployments:

Standalone MongoDB deployments is only limited to 1 pods as there is no support of storage sharing (volume mount limited to 1)— Low availability.
If the running pod gets degraded or the node is deleted and the pod needs re-scheduling, MongoDB will be unavailable until the old pod is terminated and the newer pod is spun up to ready state. Rolling update is not possible due to volume mount being restricted to 1 pod.
If the database itself gets corrupt, no restoration of data is possible.
The single pod deployment may not be able to handle high number of concurrent reads and thus the case for replica-set becomes apparent.

Why Use ReplicaSet ?

ReplicaSet is the most fail-safe way to deploy MongoDB or another stateful applications in kubernetes. The MongoDB server runs as 1 primary mongo node (k8s pod) and multiple secondary nodes providing auto-failover capacity if the primary node fails or to use the multiple secondary nodes (k8s pods) in a load-balanced manner for reading data when load increases.

High-Redundency/ Auto-failover:

The secondary replicas can automatically failover as primary node if the primary nodes fail.
If any of the EC2 instances gets degraded the primary/secondary replica running on the other EC2 instance node can keep the mongodb server running

Auto-Scaling:

Under heavy read load of MongoDB the no of secondart replicas (pods) can scale up automatically or scale down when load reduces.

Synopsis- How Replication happens ?

The rs.initiate() command runs the 1st time the 1st mongoDB node (k8s pod) is deployed on kubernetes. This initiates replication of mongo.
In addition, a user with root privileges is created to authenticate on the replica nodes.
All the pods that are deployed subsequently are assigned as secondary mongo nodes.
The rs.add() command is used to add this secondary nodes as a part of the replicaset.
The rs.remove() command is used to remove a secondary mongo node from the replicaset configuration before the k8s pod terminates.

Pre-requsities:

AWS EKS Cluster preferred.
EFS CSI driver installed on EKS.
An EFS storage on the same VPC as the EKS and with inbound rules allowing access from EKS node (instance) security group on the EFS network.

Why use EFS instead of EBS ?

EBS (Elastic Block storage) is specific to availability zone. This means only pods is the same AZ can be mounted on the EBS.

EFS (Elastic File Storage) on the other hand is available across all AZs under standard creation. This makes the volume available across pods in different AZs.
This makes the pods highly available and we do not need to restrict the mongodb deployment with nodeSelectors.

Deep-Dive:

All the kubernetes manifest files for MongoDB replicaset deployment on EKS is available on github:
https://github.com/ankan-devops/k8s-mongo-replicaset.git

Now, let’s get into the deployment process of MongoDB replicaset:

mongodb-deploy.yaml -> The deployment yaml for MongoDB statefulset deployment.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: deploymongodb
  namespace: mongo
  labels:
    app: deploymongodb

spec:
  revisionHistoryLimit: 3
  serviceName: mongodb-svc
  selector:
    matchLabels:
      app: deploymongodb
  updateStrategy:
    type: RollingUpdate
  minReadySeconds: 0
  template:
    metadata:
      labels:
        app: deploymongodb
    spec:
      terminationGracePeriodSeconds: 10
      containers:
      - name: deploymongodb
        image: mongo:6.0

        resources:
          limits:
            cpu: 1024m
          requests:
            cpu: 600m

        command:
        - /bin/sh
        - -c
        - |
          /data/mongodbinit/mongo-user.sh &
          mongod --replSet=rs0 --bind_ip_all --auth --dbpath=/data/db --keyFile=/data/mongodbkey/mongodb.key --setParameter=authenticationMechanisms=SCRAM-SHA-1;

        lifecycle:
            preStop:
              exec:
                command:
                - /bin/bash
                - -c
                - |
                  if [ -f /data/db/admin-user.lock ]; then
                  if [ "$HOSTNAME" != "deploymongodb-0" ]; then
                  mongosh deploymongodb-0.mongodb-svc:27017 -u $MONGO_INITDB_ROOT_USERNAME -p $MONGO_INITDB_ROOT_PASSWORD --authenticationDatabase=admin --eval 'rs.remove("'$HOSTNAME'.mongodb-svc:27017")';
                  rm -f /data/db/admin-user.lock; fi; fi;
        ports:
          - containerPort: 27017

        envFrom:
        - secretRef:
            name: mongo-creds
        livenessProbe:
          exec:
            command:
              - bash
              - -c
              - |
                if [[ "$(mongosh localhost:27017 -u $MONGO_INITDB_ROOT_USERNAME -p $MONGO_INITDB_ROOT_PASSWORD --authenticationDatabase=admin --eval 'db.adminCommand({ping:1}).ok' --quiet)" == "1" ]]; then
                exit 0;
                else exit 1;
                fi
          initialDelaySeconds: 90
          periodSeconds: 60
          failureThreshold: 3
          timeoutSeconds: 5

        readinessProbe:
          exec:
            command:
              - bash
              - -c
              - |
                if [[ "$(mongosh localhost:27017 --authenticationDatabase=admin --eval 'db.adminCommand({ping:1}).ok' --quiet)" == "1" ]]; then
                exit 0;
                else exit 1;
                fi
          initialDelaySeconds: 5
          successThreshold: 1
          periodSeconds: 30
          timeoutSeconds: 5

        volumeMounts:
        - name: mongodb-persistent-storage
          mountPath: /data/db
        - name: mongodbkey
          mountPath: /data/mongodbkey
        - name: mongodb-init
          mountPath: /data/mongodbinit

      volumes:
      - name: mongodbkey
        configMap:
          name: mongodb-key
          defaultMode: 0400
      - name: mongodb-init
        configMap:
          name: mongodb-init
          defaultMode: 0777

  volumeClaimTemplates:
  - metadata:
      name: mongodb-persistent-storage
      annotations:
        volume.beta.kubernetes.io/storage-class: "efs-sc"
    spec:
      storageClassName: efs-sc
      accessModes: ["ReadWriteMany"]
      resources:
        requests:
          storage: 1Gi

Why Statefulset ?

The MongoDB replicasets are deployed as a statefulset. The reasons for using statefulset are:
a. Unique HOSTNAME for each pod
b. Use the same persistent volume mount if the pod restarts.
c. Deployment of pods in ascending order, so the pod with suffix 0 will always be primary

The Command section:

command:
        - /bin/sh
        - -c
        - |
          /data/mongodbinit/mongo-user.sh &
          mongod --replSet=rs0 --bind_ip_all --auth --dbpath=/data/db --keyFile=/data/mongodbkey/mongodb.key --setParameter=authenticationMechanisms=SCRAM-SHA-1;

The command section runs the mongo-user.sh script on every pod start/ restart. This is used to initiate the replicaset for the 1st time or add more replicas to the replicaset. More on this later.
The mongod command is used to over-ride the default configuration with the following configuration:
a. The ‘ — replSet=rs0’ flag is used to define the replicaset name.
b. The ‘ — bind_ip_all’ flag is used to expose the pod to outside service
c. The ‘ — auth’ flag is used to start deployment with authentication enabled
d. The ‘ — dbpath=/data/db’ is used to mount the database from this volume path. This path is mount from EFS persistent-volume.
e. The key-file authentication flag is needed for mongodb replicaset pods to authenticate and communicate with each other
f. The ‘ — setParameter=authenticationMechanisms=SCRAM-SHA-1’ is added so that mongodb cluster users can authenticate to the database.

Lifecycle preStop:

lifecycle:
            preStop:
              exec:
                command:
                - /bin/bash
                - -c
                - |
                  if [ -f /data/db/admin-user.lock ]; then
                  if [ "$HOSTNAME" != "deploymongodb-0" ]; then
                  mongosh deploymongodb-0.mongodb-svc:27017 -u $MONGO_INITDB_ROOT_USERNAME -p $MONGO_INITDB_ROOT_PASSWORD --authenticationDatabase=admin --eval 'rs.remove("'$HOSTNAME'.mongodb-svc:27017")';
                  rm -f /data/db/admin-user.lock; fi; fi;

The lifecycle->exec->preStop section runs on every pod termination. It is used to remove the secondary mongo nodes from the MongoDB replicaset config file before the pod terminates.

Note : The removal of replica from the rs.config() file is only for the secondary mongo nodes, the primary mongo node cannot be removed from the config file or the entire replicaset will fail on next startup.

> All kind of write operations like ‘rs.add’ or ‘rs.remove’ needs to be run on the writable pod which is the primary node <deploymongodb-0.mongodb-svc:27017>.
Thus, the 1st part of mongosh commnad ‘mongosh deploymongodb-0.mongodb-svc:27017 -u $MONGO_INITDB_ROOT_USERNAME -p $MONGO_INITDB_ROOT_PASSWORD — authenticationDatabase=admin’ authenticates to the primary node.
> The ‘ — eval rs.remove(“‘$HOSTNAME’.mongodb-svc:27017”)’ part actually removes the particular pod which is terminating from the replicaSet.

Liveness & Readiness probe:

livenessProbe:
          exec:
            command:
              - bash
              - -c
              - |
                if [[ "$(mongosh localhost:27017 -u $MONGO_INITDB_ROOT_USERNAME -p $MONGO_INITDB_ROOT_PASSWORD --authenticationDatabase=admin --eval 'db.adminCommand({ping:1}).ok' --quiet)" == "1" ]]; then
                exit 0;
                else exit 1;
                fi
          initialDelaySeconds: 90
          periodSeconds: 60
          failureThreshold: 3
          timeoutSeconds: 5

        readinessProbe:
          exec:
            command:
              - bash
              - -c
              - |
                if [[ "$(mongosh localhost:27017 --authenticationDatabase=admin --eval 'db.adminCommand({ping:1}).ok' --quiet)" == "1" ]]; then
                exit 0;
                else exit 1;
                fi
          initialDelaySeconds: 5
          successThreshold: 1
          periodSeconds: 30
          timeoutSeconds: 5

The liveness & readiness probe are used to:

check the status of the mongodb pods
It dosen’t allow traffic routing to the corresponding pods unless it gets ready
Restart the pod if the pods get degraded.

Database Volume mount:

The VolumeClaimTemplate mounts the EFS storageClass on the pods. The EFS is used for persistent-storage of the database as using local storage of the EC2 instance (k8s nodes) will cause in data wiping out on EC2 instance deletion. More on this later.

mongodb-configmap.yaml -> The kubernetes configmap yaml file consists of 2 components :

mongodb-init —
a. For initializing the first MongoDB pod as primary node replica
b. For adding the further pods as secondary read replicas.

apiVersion: v1
kind: ConfigMap
metadata:
  name: mongodb-init
  namespace: mongo
data:
  mongo-user.sh: |
    #!/bin/bash
    if [ ! -f /data/db/admin-user.lock ]; then
    sleep 45;
    if [ "$HOSTNAME" = "deploymongodb-0" ]; then
      mongosh admin --eval 'rs.initiate({_id: "rs0", members: [{ _id: 0, host: "deploymongodb-0.mongodb-svc:27017" }]})';
      sleep 3;
      mongosh admin --eval 'db.getSiblingDB("admin").createUser({user : "'${MONGO_INITDB_ROOT_USERNAME}'",pwd  : "'${MONGO_INITDB_ROOT_PASSWORD}'", roles: [ { role : "root", db : "admin" } ]})';
      mongosh db1 -u ${MONGO_INITDB_ROOT_USERNAME} -p ${MONGO_INITDB_ROOT_PASSWORD} --authenticationDatabase=admin --eval 'db.getSiblingDB("db1").createUser({user : "'${DB_USERNAME}'",pwd  : "'${DB_PASSWORD}'", roles: [ { role : "dbAdmin", db : "db1" } ]})';
    else
      sleep 15;
      mongosh deploymongodb-0.mongodb-svc:27017 -u ${MONGO_INITDB_ROOT_USERNAME} -p ${MONGO_INITDB_ROOT_PASSWORD} --authenticationDatabase=admin --eval 'rs.add( { host: "'${HOSTNAME}'.mongodb-svc:27017" } )'; fi;
    touch /data/db/admin-user.lock;
    fi;

mongodb-key —
The key-file for authenticating all the secondary replicasets with the primary mongo node.

data:
  mongodb.key: |
    aP+dcZjtBO3Qjvwm2oaxjhYsGrezOi6oV49+dbpON8mTiyW0Zr359EUCYcLn/QcU
    3jPmPDnil1mdoIevcz1z/GF17gbmFVBFR9H24DMAeDzxzHhVusVp0NN6dn0mhCcO
    vQUaG9uTxKPAthkwG1JJJQwmzwhc5cXIq51hY5Ea6IoAWmxebvqohRjWf8KjXXE6
    ji0XguppRKEHk4LEZMoeztjAaSKFn/DF9IXuraHtR1RqViDbExQYc9Db/obUS1jN
    80dR7GpvE5IExju3uSSOv1LkjVgWHz/02sdk8qc9p/R6Zua6D/HuCI6OFYJ2btYq
    bIJ4rpphmI06d64XFzWexQeLpOSGBJQt+630U7GzcQHM9ARUYybrSoQ39i3w1Wci
    lMZq96AGh24WVWenIgKons0BOW4LH2RljUHl7XeSC3HC+3DtqHIuUa5I9JiAp0Ch
    8xnAcJuWr0ZQJDHr8iOJCWteOr+CMKy/UTgI96xYsq36YV+Ch/swNBVfHVsZQoIj
    jqGNh3EhZ266wZBUXrL7xfimxCvQlEGcklbew3WyIRBrWsIXm3Wf8KTz4wmYp4Cs
    petISeEAr2Lh2Bv2SBDXPF4RCFEfgDJvszPUHAkMTp2khO22s08ohYfq8ebMx6hC
    U3AnnbGtFqQ3PUdtWstPKoRLWp3cM6hc6rRECIpkJhzkPT9J2cmtwxLwvlKoSDZw
    UzZFptrD9I19xNP5QGacmIJCwFIm0Wwnv1bZUKz5+JopqyAPoTeq1k9J3bq2fRIX
    2ArTDUPqluqzlNI1PBM6+5A58arR+eSNQMfmOKfQVA6LV5kQruJrolFQIwqsnBwy
    Hf/OlHSjYR7gCRpd6Mh+O3hgd1baq+DnQgSGbsc+5qJks9sC4c0i1JIggM09+90g
    EMaS4C5m7YTgnHGUPDMS2iSr+lsxurrhR8h/3vU4JBMuGQysQpTpv1VpYxIFUhyq
    fCTVBz7aLengL4DxgEqwfLoDtIrMeMZuEyblNnH9G3oEBxnvJpY83vfHweVrneS7

Let’s understand:

monogdb-init — The script in mongodb-init is used to initiate replicaset on mongo server or to add secondary replicas on scale-up.

For the 1st pod- ‘deploymongodb-0’ that gets assigned as the primary mongo node.

In the ‘mongosh admin — eval rs.initiate..’ command, the <host: “deploymongodb-0.mongodb-svc:27017”> gets resolved from the service to the pod 0.
The rs.initiate() command initiates the replicaSet and makes the pod ‘deploymongodb-0’ pod as primary.
The 1st createUser() command -> ’db.getSiblingDB(“admin”).createUser({user : “‘${USERNAME}’”…, roles: [ { role : “root”, db : “admin” } ]})’ creates an user with role ‘root’.
The authentication database for the root user is ‘admin’.

This root user is important because all the subsequent ‘rs’ commands to add or remove replicasets needs to be executed through this root user.

The 2nd createUser() command ->
mongosh db1 -u ${USERNAME} -p ${PASSWORD} — authenticationDatabase=admin — eval ‘db.getSiblingDB(“db1”).createUser({user : “‘${DB_USERNAME}’”,pwd : “‘${DB_PASSWORD}’”, roles: [ { role : “dbAdmin”, db : “db1” } ]})’
creates a secondary user with role ‘dbAdmin’ on the ‘db1’ database.
You can change the ‘db1’ database name for the ‘2nd user’ with any of our need. The database will be the authentication database for the 2nd user with the ‘dbAdmin’ role.

The user with ‘dbAdmin’ role will be used to carry all database related CRUD operations and also to connect mongo-express to mongo server.

2. For all the secondary mongo replica pods the-> “mongosh deploymongodb-0.mongodb-svc:27017 -u ${USERNAME} -p ${PASSWORD} — authenticationDatabase=admin — eval ‘rs.add( { host: “‘${HOSTNAME}’.mongodb-svc:27017” } )’’ adds subsequent replicas to the primary mongo node.

This rs.add() command is used for auto scaling up the MongoDB replicaset.

— — —

mongodb-key — The configmap mongodb-key conatains the key-file for authentication among replicasets.

The key file is created using the command:
openssl rand -base64 768
This is a random generated password and the basic form of authentication but more advanced authentication like x.509 can be used.

The configmap is mounted as a file in the MongoDB pods.
This file is mentioned in the ‘ — keyFile=/data/mongodbkey/mongodb.key’ section of mongod configuration in the statefulset.

mongodb-sc.yaml -> EFS volume as StorageClass

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: efs-sc
provisioner: efs.csi.aws.com
reclaimPolicy: Retain
parameters:
  provisioningMode: efs-ap
  fileSystemId: fs-0dawbxxxxxxx
  directoryPerms: "755"
  #basePath: "/"

My deployment uses EFS as persistent volume mount in kubernetes mongodb stateful deployment.

The storageClass configuration is used to attach a specific EFS to the statefulset.

As a pre-requisite a EFS volume must be already ready in the same VPC as the EKS.
Note :- The Security Group of the EKS nodes (EC2 instances) must be added in the network section of the EFS.

The filesystemID of the EFS storageClass can be modified as per specific from the kustomization.yaml. More on this later.
The storageClass is then used to attach to the volumeClaimTemplate in the mongodb statefulset.

What exactly is a VolumeClaimTemplate ?

volumeClaimTemplates:
  - metadata:
      name: mongodb-persistent-storage
      annotations:
        volume.beta.kubernetes.io/storage-class: "efs-sc"
    spec:
      storageClassName: efs-sc
      accessModes: ["ReadWriteMany"]
      resources:
        requests:
          storage: 1Gi

Volume Claim Template is used with statefulset to automatically provision ‘Persistent volume Claim’ and ‘Persistent Volume’.

The persistent volume creates new access point for every mongodb replicas.
The storageClass is mentioned on the VolumeClaimTemplate.
The accessMode of the storageClass and the VolumeClaimTemplate must match for proper mounting of the EFS.

mongodb-svc.yaml -> Headless kubernetes service to mount on statefulset.

...
apiVersion: v1
kind: Service
metadata:
  name: mongodb-svc
  namespace: mongo
spec:
  selector:
    app: deploymongodb
  clusterIP: None
  publishNotReadyAddresses: true
  ports:
  - port: 27017
    targetPort: 27017
    protocol: TCP

Why headless service ?

Headless services are used for statefulsets instead of normal kubernetes service. The reason being, headless service resolves to specific pods through A records instead of loadbalancing to deployments. So we can directly send traffic to a primary mongo node for write specific operations.

For k8s service to be headless the ‘ClusterIP’ is set to ‘None’.

Demonstration of headless service:

As we can see through nslookup command, the mongodb-svc service reolves to the 3 IP address of the 3 replicas running under the statefulset. This is only possible due to the service being headless resolving to all 3 A records. If the service was not headless, it will reolve to only 1 IP through load-balancing

The ‘publishNotReadyAddresses’ is set to ‘true’. Why this annotation ?

The publishNotReadyAddresses feature needs to be set to ‘true’ to allow DNS to resolve to the new pod IPs that are still initializaing. This is important as the primary/secondary MongoDB nodes needs to be reached through DNS <pod-hostname>.<service-name>:27017 to initate a replica node.

mongodb-secrets.yaml — The kubernetes secrets:

      ports:
          - containerPort: 27017

        envFrom:
        - secretRef:
            name: mongo-creds
        livenessProbe:

The secrets contain the:

  MONGO_INITDB_ROOT_USERNAME: YWRtaW4=
  MONGO_INITDB_ROOT_PASSWORD: YWRtaW4=
  DB_USERNAME: dXNlcjE=
  DB_PASSWORD: dXNlcjE=

‘MONGO_INITDB_ROOT_USERNAME/PASSWORD’ — Admin username and password having the ‘root’ role.
‘DB_USERNAME/PASSWORD’ — Username and password for the 2nd user having the ‘dbAdmin’ role.
The secrets are mounted as environment variable in the statefulset deployment file

Kustomization.yaml —Patching the manifest files as per need

resources:
  - mongodb-sc.yaml
  - mongodb-configmap.yaml
  - mongodb-secrets.yaml
  - mongodb-svc.yaml
  - mongodb-deploy.yaml
  - hpa.yaml
  - mongoexp-secrets.yaml
  - mongoexp-deploy.yaml

namespace: mongo

nameSuffix: -develop

patchesJson6902: |-
- targets:
    version: v1
    kind: StorageClass
    name: efs-sc
  patch: |-
    - op: replace
      path: /parameters/fileSystemId
      value: fs-errbw5xxxxxx

- target:
      version: v1
      kind: HorizontalPodAutoscaler
      name: ashpamongodb
    patch: |-
      - op: replace
        path: /spec/minReplicas
        value: 1
      - op: replace
        path: /spec/maxReplicas
        value: 3

Kustomization template is used for making changes to the deployment process without modifying the main manifest files.

First we mention all the manifest files that needs to be deployed in the resources section
Modify the namespace of all the deployments in the namespace: section
Modify the EFS filesystem ID with the one created in the AWS.
Modify the horizontal pod autoscaling replicas as per need.

MongoDB replicaset Auto-scaling- Auto scale-up & scale-down configuration

hpa.yaml — Horizontal Pod Autoscaling for scaling as per CPU Utilization

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: ashpamongodb
  namespace: mongo
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: deploymongodb
  minReplicas: 1
  maxReplicas: 3
  targetCPUUtilizationPercentage: 60

How does new MongoDB replicaset gets automatically added and removed ?

When the Average CPU Utilization of all the pods of MongoDB replicaset crosses the threshold of ‘targetCPUUtilizationPercentage’ the no of pods increases automatically until it reaches the ‘maxRelicas’ limit or the ‘ ‘targetCPUUtilizationPercentage’ goes below the threshold. This means the number of secondary mongo nodes increases with load.

Similarly, when the CPU utilization goes below the ‘targetCPUUtilizationPercentage’, the no of secondary mongo nodes gets reduced as the pods gets termninated until the ‘minReplicas’ limit is met or the ‘targetCPUUtilizationPercentage’ again crosses the threshold.

The automatic addition and removal of replicas takes place using the ‘rs.add’ and ‘rs.remove’ command.

As the new secondary replicas get created all database data is replicated to the new pod.

Mongo-express manifest files

Mongo Express Deployment is a lightweight tool used for managing Mongo database. I have added the mongo-express deployment and service yaml in my github repo.

        env:
        - name: ME_CONFIG_MONGODB_AUTH_USERNAME
          valueFrom:
            secretKeyRef:
              name: mongo-creds
              key: DB_USERNAME
        - name: ME_CONFIG_MONGODB_AUTH_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mongo-creds
              key: DB_PASSWORD
        - name: ME_CONFIG_MONGODB_ENABLE_ADMIN
          value: 'false'

The mongo-express authentication with the mongo server:

The mongo-express authenticates to the mongoDB using the user created with dbAdmin role.

The mongo-express secrets:

apiVersion: v1
kind: Secret
metadata:
  name: mongo-exp-creds
  namespace: mongo
data:
  ME_CONFIG_MONGODB_URL: bW9uZ29kYjovL3VzZXJuYW1lOnBhc3N3b3JkQGRlcGxveW1vbmdvZGItMC5tb25nb2RiLXN2YyxkZXBsb3ltb25nb2RiLTEubW9uZ29kYi1zdmMsZGVwbG95bW9uZ29kYi0yLm1vbmdvZGItc3ZjOjI3MDE3L2RiMT9yZXBsaWNhU2V0PXJzMA==
  ME_CONFIG_BASICAUTH_USERNAME: bW9uZ29fdXNlcg==
  ME_CONFIG_BASICAUTH_PASSWORD: bW9uZ29fdXNlcg==
  ME_CONFIG_MONGODB_AUTH_DATABASE: ZGIx

The ‘ME_CONFIG_MONGODB_URL’ contains the connection URL string to connect to MongoDB server.

The connection string contains all the mongodb hostnames — ‘<$HOSTNAME>.<SERVICE_NAME>’ seperated by comma delimiter.

The URL string is in the format:

mongodb://username:password@deploymongodb-0.mongodb-svc,deploymongodb-1.mongodb-svc,deploymongodb-2.mongodb-svc:27017/db1?replicaSet=rs0

The ‘ME_CONFIG_BASICAUTH_USERNAME/PASSWORD’ are the credentials for mongo-express console login
The ‘ME_CONFIG_MONGODB_AUTH_DATABASE’ is the database name that the mongo-express will have access to.

The ‘ME_CONFIG_MONGODB_AUTH_DATABASE’ must be the same as the authentication database of the ‘ME_CONFIG_MONGODB_AUTH_USERNAME’ user as defined in the mongo-express deployment.

How to deploy on AWS EKS ?

Follow the following steps:

Clone the repo : https://github.com/ankan-devops/k8s-mongo-replicaset.git
Make sure an EFS is ready.
Modify kustomization.yaml as per need.
Run the command ‘kubectl apply -k .’ on the base path of the repo.

You can now access the MongoDB server from inside the kubernetes cluster using the URL:

For both Read & Write Operations:-

mongodb://<username>:<password>@deploymongodb-o.mongodb-svc.<namespace>.svc.cluster.local,deploymongodb-1.mongodb-svc.<namespace>.svc.cluster.local,deploymongodb-2.mongodb-svc.<namespace>.svc.cluster.local:27017/<mongo_database>
or,
mongodb://deploymongodb-o.mongodb-svc.<namespace>.svc.cluster.local,deploymongodb-1.mongodb-svc.<namespace>.svc.cluster.local,deploymongodb-2.mongodb-svc.<namespace>.svc.cluster.local:27017/<mongo_database> -u <username> -p <password> — authenticationDatabase=<mongo_database>

For only Read operations:

mongodb://<username>:<password>@deploymongodb-o.mongodb-svc.<namespace>.svc.cluster.local,deploymongodb-1.mongodb-svc.<namespace>.svc.cluster.local,deploymongodb-2.mongodb-svc.<namespace>.svc.cluster.local:27017/<mongo_database>?readPreference=secondaryPreferred

Note:-

The DNS name of each MongoDB nodes (primary & seondary) created through k8s service needs to be added in the URL with comma seperated delimiters.
The DNS names of the individual k8s pods are of the format —
<hostname>.<k8s_service_name>.<namespace>.svc.cluster.local
The ‘readPreference’ tag is used to prioritize secondary mongo nodes for read operations.
By default, all read and write queries are sent to the primary mongo nodes because of data consistency as replication may take time.
Make sure the user in the <username> is created on <mongo_database> and uses the database for authentication.

To access Mongo-express from outside attach an ingress on the mongo-express service.

Testing out Auto-scaling & Replication

Testing Auto-scaling:

First, let’s increase the no of replica pods manually by modifying the ‘minReplicas: ‘ to 3 in hpa.yaml or kustomization.yaml.

    patch: |-
      - op: replace
        path: /spec/minReplicas
        value: 3
      - op: replace
        path: /spec/maxReplicas
        value: 3

We can see the number of replica pods has increased to 3.

No of pods increased to 3 after HorizontalPodAutoScaling

Therefore the scaling is successful.

On a real-world scenario auto-scaling will happen based on ‘Target CPU Utilization’, when the avg. CPU usage of the pod will cross the threshold designed in hpa.yaml

— — —

Testing Replication:

We will test if data replication is working by creating a collection in the primary node and see if it is reflecting on a secondary node.

Note :- Write operation can only happen on primary.

Steps:

First, we connect to the primary mongo.
Then, we create a collection called ‘test’.

Running the commands:

mongosh mongodb://mongodb-svc:27017/db1 -u ${DB_USERNAME} -p ${DB_PASSWORD}

>db.createCollection("test")

Output:

**Creating DB collection on Primary Mongo Node**

We can see a new collection named ‘test’ got created in the primary mongo node.

Checking if the data is replicated:

Steps:

Connect to the secondary mongo node
Query the list of mongo collections

Running the commands:

mongosh mongodb://mongodb-svc:27017/db1?readPreference=secondaryPreferred -u ${DB_USERNAME} -p ${DB_PASSWORD}

>db.getCollectionNames()

Output:

**Data getting replicated on secondary node**

We can see the collection name is replicated in the secondary mongo pods.

Therefore, replication is successful !

— — —

Testing Auto-Failover

As seen from the image below, the primary mongo node was on deploymongodb-0 hostname at first.

After deleting the primary pod, the secondary mongo node deploymongodb-2 was promoted to primary.

The deploymongodb-0 pod has now been demoted to a seconndary mongo node.

**deploymongodb-0 pod now as secondary mongo node**

Thus, Auto failover is successful. The secondary mongo node got promoted as primary after the initial primary node got degraded !

MongoDB ReplicaSet on Kubernetes with Auto-failover & Auto-scaling

The Current Connundrum:

Before we get into the how’s, let us understand WHY !?

Following are the dis-advantages of single pod deployments:

Why Use ReplicaSet ?

High-Redundency/ Auto-failover:

Auto-Scaling:

Synopsis- How Replication happens ?

Pre-requsities:

Why use EFS instead of EBS ?

Deep-Dive:

mongodb-deploy.yaml -> The deployment yaml for MongoDB statefulset deployment.

Why Statefulset ?

The Command section:

Lifecycle preStop:

Liveness & Readiness probe:

Database Volume mount:

mongodb-configmap.yaml -> The kubernetes configmap yaml file consists of 2 components :

Let’s understand:

monogdb-init — The script in mongodb-init is used to initiate replicaset on mongo server or to add secondary replicas on scale-up.

mongodb-key — The configmap mongodb-key conatains the key-file for authentication among replicasets.

mongodb-sc.yaml -> EFS volume as StorageClass

What exactly is a VolumeClaimTemplate ?

mongodb-svc.yaml -> Headless kubernetes service to mount on statefulset.

Why headless service ?

The ‘publishNotReadyAddresses’ is set to ‘true’. Why this annotation ?

mongodb-secrets.yaml — The kubernetes secrets:

The secrets contain the:

Kustomization.yaml —Patching the manifest files as per need

MongoDB replicaset Auto-scaling- Auto scale-up & scale-down configuration

How does new MongoDB replicaset gets automatically added and removed ?

Mongo-express manifest files

The mongo-express authentication with the mongo server:

The mongo-express secrets:

How to deploy on AWS EKS ?

You can now access the MongoDB server from inside the kubernetes cluster using the URL:

Testing out Auto-scaling & Replication

Testing Auto-scaling:

Testing Replication:

Checking if the data is replicated:

Testing Auto-Failover

Written by Ankan-Devops