Cross-Cloud Backup: How to Back Up PostgreSQL Databases from GCP to Azure for a SaaS Application
In today’s multi-cloud world, managing backups across cloud platforms can be challenging, especially if you’re working with a multi-tenant SaaS application. If you’re using Google Cloud Platform (GCP) for your databases but want to store backups in Microsoft Azure, you’ll need to manually orchestrate this process, as no direct cross-cloud backup solution exists in GCP for Cross cloud backup.
In this blog, I’ll walk you through designing an architecture to back up PostgreSQL databases from GCP to Azure using a tenant-per-database model. We’ll skip trial tenants, focus on active databases, and automate the entire process.
Problem Statement
You want to:
- Automatically back up tenant-specific PostgreSQL databases hosted on GCP.
- Store backups temporarily in GCP Cloud Storage.
- Transfer those backups to Azure Blob Storage for long-term storage.
- Implement region-based scheduling and retention policies for efficiency and cost control.
Let’s break this down step by step.
Architecture Overview
Here’s a quick summary of the process:
- GCP: Retrieve active tenant databases, create backups, and store them in Google Cloud Storage.
- Azure: Use Azure Data Factory to transfer backups from GCP to Azure Blob Storage.
- Efficiency: Use scheduling and expiry policies to optimize backup timing and storage costs.
Process
Step 1: Retrieve Active Tenant Databases
In a tenant-per-database SaaS model, each tenant has its own PostgreSQL database. The first step is to retrieve a list of active tenants, skipping trial tenants (for cost reduction).
Google Cloud Function:
- This function connects to the catalog (master) database.
- It retrieves the list of active tenant databases and their connection details (excluding trial tenants).
- Pushes the database details into a Google Pub/Sub queue for further processing. Each database details is pushed as a separate item in queue.
Step 2: Push Database Info to Pub/Sub Queue
With active tenant databases fetched, we push the information to a Google Pub/Sub queue. This step allows us to asynchronously process each database, decoupling the retrieval and backup steps.
- You can implement region-based scheduling to back up databases during non-peak hours. For example, databases in us-central1 could be backed up at night to minimize impact on performance.
- Add a time delay between processing each database to avoid overloading the system (e.g., a 2-minute delay set for retrieval of item from pub sub queue on each item).
Step 3: Backup Tenant Databases and Store in Google Cloud Storage
Now that the database details are in Pub/Sub, the next step is to back up each tenant database and store the backup in Google Cloud Storage.
- Google Cloud Function / Azure Function (Subscriber):
- This function subscribes to the Pub/Sub queue.
- For each tenant database, it runs a pg_dump to create a backup.
- Stores the backup file (compressed) in a Google Cloud Storage bucket.
- Set an expiry policy on the files in GCP Storage to automatically delete after 3 days.
Step 4: Transfer Backup Files to Azure Blob Storage
Once the backups are in GCP Cloud Storage, we need to transfer them to Azure Blob Storage for long-term storage. This is where Azure Data Factory (ADF) comes into play.
- Azure Data Factory:
- Create a pipeline in ADF that connects to the GCP Cloud Storage bucket.
- Copy the backup files from GCP to Azure Blob Storage.
- Apply a retention policy on the Azure Blob Storage container to automatically delete files after 60 days.
Additional Tips for Optimization
Efficient Scheduling:
Schedule backups by region to avoid peak times and minimize performance impact. You can implement this logic in the function that pushes data to Pub/Sub.
Time Gaps Between Backups:
Add a delay (e.g., 2 minutes) between processing each tenant’s database to prevent overwhelming the system.
File Expiry:
Set expiration dates on the backups stored in GCP (e.g., 3 days) and Azure (e.g., 60 days) to minimize storage costs.
Monitoring and Alerts:
Implement monitoring tools like Google Cloud Monitoring and Azure Monitor to track backup progress and trigger alerts for failures.
Conclusion
By combining Google Cloud Functions, Pub/Sub, Azure Data Factory, and Blob Storage, you can automate cross-cloud PostgreSQL database backups efficiently. This architecture ensures that tenant databases are backed up, transferred, and stored according to your desired schedules and retention policies.
This approach is flexible and can be tailored to meet your application’s specific needs. As multi-cloud strategies continue to evolve, having a solid backup strategy across platforms is critical for disaster recovery and data redundancy.
Have you implemented cross-cloud backups for your SaaS application? Let me know your thoughts and experiences in the comments!