MongoDB replication in real time
Hands-on example of implementing data synchronization between two Mongo instances with Grainite.
In the real-time database synchronization blog, we talked about the challenges developers face in leveraging traditional tools to achieve data synchronization between multiple data sources. We also provided an overview of the benefits of Grainite and how Grainite simplifies database synchronization.
This blog provides a step-by-step example of synchronizing data between two MongoDB instances using Grainite. By the end of this blog, you would have
- Brought up the Mongo and Grainite docker instances
- Add dummy data to test the two-way replication
- Connect to your own Mongo instances to replicate actual data in real-time
Prior to running this example, if you would like a high-level overview of Grainite, you can read the conceptual model in our documentation.
Prerequisites
This demo has been developed using the Java programming language. To run this successfully, you should have the following software packages installed on your Mac/Linux machine.
- gx command line interface tool
- JDK 11 or above (recommend JDK 17)
- Docker and Docker-compose.
You can refer to the environment setup in our documentation for additional instructions.
Install gx CLI
gx is a command line interface (CLI) tool provided by Grainite. While it is recommended to review the capabilities of this tool, it is not required to walk through the demo.
- Download the gx CLI installer script from here — https://gitlab.com/api/v4/projects/41132986/releases/permalink/latest/downloads/gx-bin/install_gx.sh
- Run the installer script:
chmod +x install_gx.sh && sudo ./install_gx.sh
- gx should have been installed at /usr/local/bin. To confirm, run the following:
gx --version
Install Java
Java is required to build the app using mvnw, as well as to run the app.
Install docker and docker-compose
Docker Compose is required to launch Grainite, along with two Mongo instances. Follow the instructions on this page to install the docker desktop or the docker engine.
Running the demo
Download the package
There are multiple Grainite applications available in the samples repository. You can download the samples directory from Git using the command
git clone https://gitlab.com/grainite/samples
The samples/cdc/mongo_cdc has the necessary files which will be used for the demo.
cd samples/cdc/mongo_cdc
Start the Docker containers
Run the following command to start Grainite and the two Mongo instances.
docker compose pull && docker compose up -d
Verify that the containers have been created without any errors. The mongo_one_init and mongo_two_init are used to initialize the Mongo replica sets for both instances. This will not be required when you connect your own MongoDB instances.
Build and load the application
- Run the following command to build the application
./mvnw clean package
- Once the application has been built, run the following command to load the Grainite application.
gx load -c app.yaml
- Verify the app has been successfully loaded
gx app ls
Connect to Mongo instances
Grainite is frequently used in situations where other services or products are already running. Often, it is necessary to either obtain data from or send data to those other products. As opposed to building integrations for those products into the core Grainite product, we’ve introduced Extensions. Extensions are individual packages that contain Tasks, Handlers, and other common code needed to integrate with those products.
Grainite contains various extensions for Azure, Debezium, JDBC, Kafka, SQL Server, and Salesforce. You can read about Grainite Extensions in our documentation.
Run the following commands to have the tasks poll both the Mongo instances for changes:
gx task start mongo_cdc_one -c app.yaml
gx task start mongo_cdc_two -c app.yaml
To confirm that both tasks are working, use gx mon (monitoring command in gx CLI) to see the counters.
gx mon
The Message Flow section indicates how many events were appended to a topic, or how the message flow looks for the message between topics and tables.
The Action Status section indicates how many times an action was invoked.
For example, mongo_cdc_one_doWork and mongo_cdc_two_doWork indicate that these actions were invoked 39 and 28 times, respectively. You will see the doWork action gets invoked frequently to poll for changes on the Mongo instances. The startTask (used to start a task) and startTaskInstance (used to start a task instance) action counters indicate how many times those methods were called.
Finally, we have the App Counters section which contains app-specific counters. These counters can be included by users in any app by using the counters/gauges API. This app’s cdc_controller_created counter indicates how many times a controller was created to poll the Mongo instances. Since there are two tasks polling each Mongo instance, the count is 2. The task_execution_status gauge indicates the status of a task. Use curl localhost:5064/export-dashboard | grep “task_execution_status” to see the label for this gauge. The task_instance_start counter indicates the number of times a task instance was started.
Add entries to the Mongo instances
Run the following command to add some dummy data to the databases.
- Run the following to add some dummy data to mongo_one’s test db and customers collection:
./run.sh mongodb://localhost:27017 test customers 10 10
- Confirm records were inserted into mongo_one by running the following command to dump the collection:
docker exec -it mongo_one mongosh --eval "db.getSiblingDB('test').getCollection('customers').find()"
- Additionally, confirm records were synced into mongo_two by running the following command to dump the collection:
docker exec -it mongo_two mongosh --eval "db.getSiblingDB('test').getCollection('customers').find()"
- Run the following to add some dummy data to mongo_two’s test db and customers collection:
./run.sh mongodb://localhost:27018 test customers 10 10
- Confirm records were inserted into mongo_two by running the following command to dump the collection:
docker exec -it mongo_two mongosh --eval "db.getSiblingDB('test').getCollection('customers').find()"
- Additionally, confirm records were synced into mongo_one by running the following command to dump the collection:
docker exec -it mongo_one mongosh --eval "db.getSiblingDB('test').getCollection('customers').find()"
- Run the Verify program to confirm all records have been synced:
java -cp target/mongocdc-jar-with-dependencies.jar org.samples.mongocdc.Verify
Note: To get the number of documents in a collection, you can run the following:
For mongo_one:
docker exec -it mongo_one mongosh --eval "db.getSiblingDB('test').getCollection('customers').countDocuments()"
For mongo_two:
docker exec -it mongo_two mongosh --eval "db.getSiblingDB('test').getCollection('customers').countDocuments()"
Connecting to your own Mongo Instances
To connect to your own Mongo instances, you can change the connection strings in the app.yaml to point to your Mongo instances, and the database and collection names.
Summary
While this example focused on real-time replication between Mongo to Mongo instances, Grainite makes it easy to move data between any two database instances. Grainite supports multiple extensions that can help move data from sources such as Apache Kafka, Azure, Debezium, Salesforce, and SQL Server and replicate the data to downstream Databases and Data Warehouses. In addition to making the replication real-time, Grainite also automatically handles failures, retries & scaling and enables complex and stateful transformation capabilities, etc.
Additional samples which demonstrate data replication between MongoDB to SQL Server, SQL Server to SQL Server, SQL Server to Kafka, and Salesforce to SQL Server will be added to the GitHub repository soon.