Data Archival Solution for on-premises MongoDB

Published in

Airtel Digital

5 min readJul 17, 2023

PREFACE

In 2020, while revamping the complete Digital Payments System and Checkout at Airtel, we had decided to split our Monolith System into smaller blocks (microservices) each having its own granular single responsibility. In the process, we also ended up migrating from Oracle SQL Database to Mongo DB. We are using an on premises Mongo Cluster.

As we scaled to millions of transactions per day, the data grew in size and with terabytes of data, we had to start moving older data out of our live database. Now, Oracle Cloud provides Auto Archival solutions and we needed something similar for Mongo. That is when we came up with this utility that can automatically move data from one mongo instance to another.

BASIC FLOW FOR ARCHIVAL

The idea is to use a time stamp field that contains the creation time of the document. The utility reads the data created after the last archived document, this helps in ensuring zero duplication of data and prevents loss of data in case of failures; the archival DB controls the range of data to be transferred.

IMPLEMENTATION

For implementation we decided on using Spring Batch. Spring Batch provides a robust, lightweight and reusable batch architecture to execute different batch jobs. It is easily scalable, uses core features of the Spring Framework and also maintains a Job Repository containing information of all the jobs that have been run.

Spring Batch collects metrics (such as job duration, step duration, item read and write throughput, and others) and registers them in Micrometer’s global metrics registry under the spring.batch prefix. These metrics can be sent to any monitoring system supported by Micrometer.

In order to setup a job, our application needs a metaData document. A json which contains all necessary info to execute the job. Here are the parameters needed:

These datapoints help in creating our Archival Job. This is a Spring Batch Job which contains of 5 steps as mentioned below:

Basically, it initially loads the meta data and then uses it to first establish connection with both mongo instances and then transfer data as per our flow above. We have added an extra step to ensure Mongo Connections are closed once archival is done as we won’t be needing them anymore. So, application connects with your Mongo only for the duration of the job run and ensures there is no additional load on your database. It is advisable to run the archival job in lean hours.

Our Utility is a Web Application, which on start up sets up a job having all above steps for all the meta data documents loaded in application’s context.

EXECUTION

Our utility exposes a REST Endpoint which can be used to launch a job. As a client, once the job is configured you can use this endpoint to launch the job as and when needed.

curl --location '{{application-end-point}}/v1/job/archive/launch' \
--header 'Content-Type: application/json' \
--header 'Authorization: xxxxxx' \
--data '{
    "jobName": "testJob",
    "jobParametersMap": {}
}'

It is a POST API which needs only 2 attributed in request body:

jobName: A mandatory attribute; The name of the job to be executed.
jobParametersMap: Optional, A map of type <String,String> to add any parameter to job. Can be any meta-data client may need to identify or analyse job execution. This will not be used and will just be saved to job execution context to be returned later.

If the job is launched successfully, the API responds with HTTP status 200.

If the metadata of the job contains a call back URI, the execution details are sent to that endpoint once execution completes.

We have a second API, which can be used to check on the state of job execution. It returns the state of executing job instance if any, or the state of last executed job.

curl --location '{{application-end-point}}/v1/job/archive/status?jobName=testJob' \
--header 'Authorization: xxxx'

This is a GET API and expects a query param jobName.

ARCHIVAL EXECUTED JOB INFO

This is a data class that we use to convey the State of Job Execution. After job execution, this is sent as a payload to the call back URI present in Archival Meta Data and is also returned as a response to the Job Status API. Here are the attributes present in this data class:

MONITORING

Spring batch provides support for batch execution metrics via Micrometer. These metrics can be pushed into any time series DB from where they can be used for creating dashboards and set up alerts. We use influxDB and Grafana for the same. The metrics exposed by Spring Batch are:

Here are a couple of sample graphs, we have set up, depicting read/write counts and execution and active tasks in Batch.

Grafana Dashboard using Micrometer Metrics

In addition to these, we also use our very own SMART API,(about which I mentioned in a previous article here) to push the Archival Executed Job Info to Elastic and then use Kibana for visualisation and alerting. Here are some sample graphs plotted using the same:

Graphs Plotted in Kibana using Data pushed via SMART API.

CONCLUSION

We have been using this utility to transfer data successfully to our archive data base for over 2 years now. We operate with both read size and chunk size of 10,000 documents and the time taken to archive around 3 million documents is 30 minutes with around 6–7 jobs running in parallel on a single machine.

We have configured a skip of 7 days and have schedulers in place which launch the jobs at midnight daily. Hence utility transfers one day data of T-7th day daily. If somehow the utility fails to run on any given day, it is intelligent enough to next day transfer data of T-8th and T-7th day. We have set up TTL index of 15 days in our live database to enable auto purge of data. This means our utility has at least 8 attempts to move the data.

With all dashboards and alerts in place, we can be sure of not loosing any data.