Backup every night your Firestore collections to secure Cloud Storage the easy way with Cloud Workflows, you don’t need to be a developer to setup the steps.
Database Backups! We know how important they are, a wrong click and someone could delete your collection or the entire database. In the case of a Disaster Recovery Plan is activated you need to have your backups to resume business operation.
Let’s make sure your Firestore/Datastore collections backup every night to secure storage.
There are various ways to trigger Cloud Firestore backups, either by using Datastore Import/Export UI, the firebase CLI tool, or using the exportDocuments API but they are not automated and need developer assistance.
In this article, we are going to orchestrate the automated backups via Cloud Workflows, we will store the exports in Cloud Storage, and we will trigger the workflow with Cloud Scheduler. These steps are fully managed and serverless, easy to setup by non-developers as well. Your project must have billing enabled.
- Create the Cloud Storage bucket
- Create the Cloud Workflow definition to execute Firestore export API call
- Setup IAM permissions to execute the Workflow
- Setup the nightly invocation via Cloud Scheduler
- Run the scheduler to see the action.
You don’t need to be a developer to set up the steps.
Step 1: Create a Cloud Storage bucket
We need a Cloud Storage path to store our Firestore exports/backups. A GCS path looks like
- Name your bucket: eg:
- Choose where to store your data:
- Choose a default storage class:
- Choose how to control access to objects:
- Advanced settings:
Set a retention policy
- Retain objects for
Now that we have a bucket, the most important thing to understand is the last step.
We have set a retention policy to specify the minimum duration that this bucket’s objects must be protected from deletion or modification after they’re uploaded for
1 month. Having this option if an account like ours or other developers are compromised and the hackers want to wipe out the backups, they won’t be able to do so, as we have set them to be retained for
1 month. So there is a good chance that even if we are on a holiday or offline trip, if our project is hacked, in 30 days you are noticing it, and you have access to your backups.
On top of the minimum setup, you could setup Lifecycle rules, like
- Set to Coldline 7+ days since object was updated
- Delete object 365+ days since object was updated
Remember your bucket name for later use - you need to add the
gs:// prefix in order to be a path example:
Step 2: Create the Cloud Workflow definition to execute Firestore export API call
We are going to use an easy way to execute our exports/backups. There are tools that need developer attention to set up under the command line or using the UI to hit a button for the export/backup to start, the latter is not automatized, so we opted for Cloud Workflows to run the export for us based on a schedule we will define later.
What is Cloud Workflows?
- Cloud Workflows lets you define pipelines and orchestrate steps using HTTP based services
- Integrate any Google Cloud API, SaaS API, or private APIs
- Out of the box authentication support for Google Cloud products
- Fully managed service — requires no infrastructure or capacity planning
- Serverless with Pay-per-use pricing model
- Declarative workflow language using YAML syntax
Cloud Workflows to execute Firestore exports/backups:
As you see in this
firestoreExportDatabase.yaml file, we have an
initialize step, where we have the
project automatically read from the environment, the Firestore database id
(default) and the
firestoreBackupBucket where exports/backups will be written.
Note: Right now Firestore users cannot generate their own databaseIds, so the default database is currently the glaringly literal string: (default), and yes, you have to include the parentheses.
You need to edit the sample to have your own Storage Bucket path added to this snippet. The rest of the YAML script doesn’t need any modification. So edit line 5, and you are good to go.
To define a workflow go to Cloud Workflows page.
- You will be prompted to Enable the Cloud Workflows API if you haven’t done so for your project. Make sure after you enabled the API, you open the console again.
- On the Cloud Workflow Dashboard, hit
- Set a workflow name and description:
- Choose region:
- You will notice there is a service account preselected for you, remember that. It may have the form of
- On the second page, paste the above snippet, make sure you edit the path to your bucket on line 5, which you created previously:
- By clicking Deploy, your workflow will get deployed.
At this point in your Workflows page, you should have your workflow in the list. If you want to run it this time to check for syntax error but be aware the workflow will fail to execute Firestore backups as we didn’t set service account permissions that authorize for Firestore/Datastore export/backups calls.
Step 3: Setup IAM permissions to execute the Cloud Workflow
Cloud Identity and Access Management (IAM) lets administrators authorize who can take action on specific resources, giving you full control and visibility to manage Google Cloud resources centrally.
In the previous step when you defined your Workflow the service account needs permission.
We need to authorize to be able to do Firestore/Datastore exports and to write to Cloud Storage.
Go to IAM Permissions page, and identify the service account from the list. Choose from the right menu the Edit option.
Add the following permissions:
- Cloud Datastore Import Export Admin — to have Full access to manage imports and exports.
- Storage Object Creator — to have Access to create objects in GCS.
- Workflow Invoker — to have Access to execute workflows and manage the executions.
Note: You can define a specific service account just for this task, or reuse the one that is the default “compute” service account. The defualt one also has an “Editor” role. Any roles that are on the service account leave them there.
At this time, you can execute your workflow. The workflow status will show
succeeded status when the workflow was able to trigger the Firestore export/backup process. As the export process takes time, based on your database size it can very to 2–15 minutes until you see in Cloud Storage a folder with the date of execution. This confirms the output was created.
Step 4: Setup the nightly invocation via Cloud Scheduler
In this step we will setup the nightly scheduled execution to trigger our workflow. Go to Cloud Scheduler page, enable the API if prompted, and revisit the Scheduler page.
To create a scheduled job, hit Create Job:
- Use a name and description for your scheduler eg:
- For frequency for midnight trigger use this syntax:
0 0 * * *
- To generate complex trigger syntaxes see: https://crontab-generator.org/
- In the Target selector choose:
HTTPas method choose
- Enter the below URL:
- You need to edit the above URL to replace the placeholders.
- Now, permissions. In Show more section, configure Auth header:
OAuth, and add your service account previously used in Step 3, and for Scope use:
- Leave other selections with their default selections.
PROJECT_ID — you will find it from the url eg:
WORKFLOW_NAME — is the name of the workflow that you want to trigger eg:
firestoreExportDocuments (or the name you‘ve given to your Workflow)
Step 6: Run now the scheduler to see in action
On the Cloud Scheduler entries, you can hit Run now. Cloud Scheduler triggers Cloud Workflow, and Workflows will execute the Firestore Export Documents API call, which will read all collections and create an export that will be placed in the Cloud Storage bucket you defined. Also a successful export is logged to Firestore/Datastore Import/Export page.
The whole operation can take based on your data size between 2–15 minutes.
Exports/backup incur costs, as every document in a collection is parsed and read in order to create an export/backup. As Firestore is a serverless product all these operations count as “reads” and will be part of your monthly bill. Based on this information, you can set the frequency of the backup mechanism, it could be once a day, or once a week, depending on your organization’s policies and assumed risks in case of an emergency.
The output format is not self readable, files are packet into many parts and there is a manifest file that resembles the schema and format. To import/restore backups to a Firestore/Datastore instance, you can do that manually from the Datastore UI, but this should be done by your developers.
We have explored creating a fully managed, serverless automatically triggered Workflow that triggers the Firestore export/backup API, and it places into a Cloud Storage bucket, this way you ensure proper backup for disaster recovery.
As it’s serverless no maintenance of SDK tools, no updates to libraries are involved, and even a non-developer can set it up.
If you are a developer, we recommend using VSCode as there you can set up the GCP Project Switcher extension, and also to define IDE tasks to automate, deploy, execute, and describe execution.