Auto-healing MongoDB replication with Terraform

Data is the next big thing and this has accentuated the importance of the way you manage your database. There are various operations that you perform with your database such as replication, partitioning, sharding, etc. These operations, most of the times, are not necessarily mutually exclusive. Database replication is one of the key operations when it comes to data recovery, fault tolerance, and consistency.

It’s almost a decade now that MongoDB is pretty much there in the database landscape. I am sure pretty much everyone who has tried their hands on MongoDB is aware of what replica set and sharding is and when it comes to the fault tolerance, replica set is great to have in your infrastructure.

In this blog, I am going to talk about how you can provision MongoDB replica set on GCP compute engine using Terraform that will be capable of self-healing in case of node failure. Plus, in order to manage load balancing, I want it to be auto-scaling. Following are a few major points I will be talking about in this blog:

*Steps to create GCP Managed Instance Group (MIG) using compute instance template

*Startup script for compute instance template resource

  • To install MongoDB on nodes
  • To find minimum IP within the nodes present in MIG
  • To declare node with IP as a primary
  • To add cron job for service discovery

*Manual verification of auto scale and self-healing MongoDB replication nodes

*Terraform module conversion.

But before starting with the actual coding, let me clear the difference between auto-scaling and auto-healing. Auto-scaling helps replication to increase or decrease the total number of nodes in a replica set and is mainly used for load balancing. Consider a scenario where 3 nodes are handling the traffic to 90% of their capacity. Auto-scaling will spin-up a new node in a replica set to manage the load, in case the load increases. Whereas auto-healing is the next avenue in auto-scaling. Auto-healing helps the newer nodes to sync with the existing nodes. It also helps to add a new node to replace a dead node.

So just a few days back, while I was working on a project for one of our clients, I got a task in which I had to modify one Terraform script to make it self-healing. The script previously had replica set setup on GCP Compute Engine and it needed few improvements to have auto-healing, fault-tolerance, and minimum replica set member management in order to achieve load balancing. The script was based on separate Compute Engines. So, the first obvious step was to scrap the existing script and redraw the architecture of the replica set. My initial approach was to utilize the existing features and resources to minimize my work. So, I moved to GCP Managed Instance Group (A managed instance group uses an instance template to create a group of identical instances. You can control a managed instance group as a single entity.) in order to achieve auto-scaling. So, let’s see how you can provision GCP Managed Instance Group using Terraform.

Create a GCP Instance Group Manager:…read more.