Architecting DR solution for Cloud Firestore not having multi-region support

Vinod Patel
Google Cloud - Community
4 min readJan 23, 2023

Disclosure: All opinions expressed in this article are my own, and represent no one but myself and not those of my current or any previous employers.

Firestore is Google managed serverless NoSQL document database built for automatic scaling, high performance, and ease of application development. Firestore offers 99.99 % availability for regional deployments as compared to 99.999 % for multi-regional instances.However, many GCP regions like Montreal , Sydney etc do not yet support multi-region deployments of Firestore instances. Check out Cloud Firestore locations here

This blog explores different options and strategies to build a robust and automated backup and disaster recovery (DR) process for Firestore instances with single region deployment capability.

Disaster Recovery on Cloud

The term cloud disaster recovery refers to the strategies and services enterprises apply for the purpose of backing up applications, resources, and data into a cloud environment.

Cloud DR helps protect enterprise cloud resources and ensure business continuity. If disaster occurs, enterprises can quickly restore and resume normal operations. Another key advantage is the ability to automate many processes and quickly scale according to business requirements and needs.

Service level objectives are the key measurements which drive the disaster recovery and BCP (Business Continuity Plan) strategies for an application. Two such important SLOs are RPO and RTO.

Recovery Point Objective (RPO) is your goal for the maximum amount of data an application can tolerate losing. This parameter is measured in time: from the moment a failure occurs to your last valid data backup. For example, if an application experiences a failure now and its last full data backup was 8 hours ago, the RPO is 8 hours.

Recovery Time Objective (RTO) is the goal your organisation sets for the maximum length of time an application should take to restore normal operations following an outage.

Lets jump into the options:-

The approaches suggested here is to have an active-passive architecture with the active instance in the primary region hydrating the secondary instance in the DR region. Customer must consider data residency requirements and choose the DR region appropriately for Firestore.

1. Export/Import

This feature is available out of the box in Cloud Firestore. Refer below reference architecture to implement this.

Solution Highlights:

  • The solution proposed is to perform scheduled exports from the primary instance into co-located regional bucket, move exported copy to DR location bucket and restore to DR instance
  • The steps listed above can be orchestrated via Cloud Scheduler, Cloud Pubsub and Cloud Function (Export, Movement and Import)
  • The solution leverages levelDB format export and import features of Firestore.

Reference — https://cloud.google.com/firestore/docs/manage-data/export-import

PS: Cloud Firestore does not offer a managed backup and restoration solution at the time of writing this blog.

Some Considerations for Export/Import option:

  • Frequency of export/import should be decided as per service level objectives
  • Decide on scheduled vs on-demand restoration based on the service levels. (Write cost can be saved with on-demand restoration from the backups.)
  • Firestore Imports does not take care of any deleted documents in primary instance which needs to be handled explicitly
  • Housekeeping activities like retention and archival of backup files in GCS buckets will be required.
  • Billing will include the reading of documents from firestore instance, GCS bucket storage and writing to DR instance
  • Size of the database and volume growth rate YoY are important factors to consider here because it may lead to higher cost and recovery time.

2. Event Based Replication

Below solution leverages firestore event triggers to hydrate data into DR instance.

Solution Highlights:

  • Perform real-time replication of events from the Primary instance to DR instance via firestore event triggers
  • Lower rate of RPO can be achieved using real time replication of events.
  • The replication of events from primary instance to the DR instance is handled via Cloud function
  • The cloud function generates 2 objects for the “write” firestore event -Event Object and Context object which can be used to determine event type(i.e. Insert/Update/Delete) and take action on DR firestore accordingly.

References —

https://cloud.google.com/functions/docs/calling/cloud-firestore

https://cloud.google.com/functions/docs/calling/cloud-firestore#event_structure

Considerations for real time replication based option :

  • For low-latency writes, cloud function does not guarantee the ordering of the events. So this solution may not be fit where concurrency is very high so the possibility of simultaneous write on the same document.
  • Solution needs to be tested for the peak write per second operation to check the consistency of replication.
  • Event based replication is possible on collection level — for every unique collection, there needs to be function for handling the writes.
  • Cloud functions Gen 1 supports the firestore event triggers.
  • The billing aspect will involve the charges for invocation of cloud functions and write operations performed on the DR instance.

Comparison of Option 1 & 2

I hope this comparative view would help in choosing the best possible DR option for your application.

Conclusion

Cloud Firestore is Google managed serverless NoSQL database and offers multi-region support which should be go-to solution while designing DR strategy. However, locations where only single-region deployments are possible as of now, they still can build reliable and high available application using either Export/Import or Event based replication approach.

--

--

Vinod Patel
Google Cloud - Community

Cloud Consultant at Google. Works closely with customers and developers to build data platform solutions in the cloud.