Disaster Recovery on Google Cloud for Data (Part 1)

Get Cooking in Cloud

Priyanka Vergadia
Google Cloud - Community
7 min readJan 9, 2020

--

Introduction

Get Cooking in Cloud is a blog and video series to help enterprises and developers build business solutions on Google Cloud. In this second miniseries I am covering Disaster Recovery on Google Cloud. Disasters can be pretty hard to deal with when you have an online presence. In these articles, we have been elaborating on how to deal with disasters like earthquakes, power outages, floods, fires etc.

Here are all the articles in this miniseries, for you to checkout.

  1. Disaster Recovery Overview
  2. Cold Disaster recovery on Google Cloud for on-premise applications
  3. Warm Disaster recovery on Google Cloud for on-premise applications
  4. Hot Disaster recovery on Google Cloud for on-premise applications
  5. Cold Disaster recovery for applications in Google Cloud
  6. Warm Disaster recovery for applications in Google Cloud
  7. Hot Disaster recovery for applications in Google Cloud
  8. Disaster recovery on Google Cloud for Data: Part 1 (This article)
  9. Disaster recovery on Google Cloud for Data: Part 2

Since data is the most important part of any application’s recovery, I thought talking specifically about that would make sense. Hence, I am focusing this and the next article on DR for data. In the next one, you will learn to plan for data recovery when production environment is on Google Cloud and in this article, you will learn to plan for data recovery when production environment is on-premise or on another cloud. So, read on!

What you’ll learn

  • Why is data backup important?
  • What does data recovery entail?
  • Data backup and recovery
  • Database backup and recovery
  • Data recovery if production is in another cloud

Prerequisites

  • Basic concepts and constructs of Google Cloud so you can recognize the names of the products.
  • Read the overview article for DR related definitions.

Check out the video

Disaster Recovery on Google Cloud for data that is in on-premises production environment

Why is data backup important?

Mane-Street-Style is an e-commerce company that has their production environment on-premises. Imagine if they were to discover that they have lost all the recent customer orders during a disaster. It would be a HUGE loss for their business if orders are not fulfilled. Since data is a critical piece of their application, let’s dive deeper into it and help Mane-Street-Style with some strategies to avoid losing data during a disaster!

What does “Data” recovery entail?

You can only recovery data if you have backed it up somewhere. But what do backups mean when it comes to Data?

The term backup in regards to “Data” covers two scenarios:

  • Data backups: Backing up data alone involves copying a discrete amount of data from one place to another to recover from corruption or when production is down.
  • Database backups: DB backups, are slightly more complex because they typically involve recovering to a point-in-time. Hence, we need to not just consider backing up the DB, but also consider backing up transaction logs and then applying them to the DB backup during recovery.

Now that we have the basic understanding of data and database backups for DR, let’s consider “Mane-Street-Style’s” scenario and how they can set up DR specifically for Data.

Data Backup and Recovery

For Data backup and recovery, there are a few options:

  • Mane-Street-Style can create a scheduled task that runs a script or an application to transfer the data to Cloud Storage. Or, they can automate a backup process to Cloud Storage using the gsutil command-line tool, Cloud Composer or one of the Cloud Storage client libraries.
  • They can use a third-party solution. Mane-Street-Style, uses the most common tiered storage patterns where they have the most recent backups on faster storage and slowly migrate the older backups to cheaper storage. When they use Google Cloud as the target, they can use Cloud storage Nearline or Coldline as the equivalent of the slower tier. One way to do this is to use a partner gateway between on-premise storage and Cloud storage. Partner solutions manage the transfer from an on-premise Network-attached-storage appliance or storage-area-network.
Data Recovery using partner gateway

Database Backup and Recovery

Mane-Street-Style can use a number of strategies to implement a process to recover a database system from on-premises to Google Cloud. Let’s look at the two of the most common solutions.

  • Use a backup and recovery using a recovery server on Google Cloud
  • Use a standby server on Google Cloud for replication.

Backup and recovery server approach:

Steps to be taken “before” disaster

  • Create a database backup using the built-in backup mechanisms of the database management system. This will typically create the backup to a local disk.
  • Then create a Cloud Storage bucket as the target for the data backup in Google Cloud
  • Copy the backup files to Cloud Storage (GCS) using gsutil or a partner gateway solution that we looked at earlier.
  • Because this is DB, copy the transaction logs to the recovery site on Google Cloud. Having a backup of the transaction logs helps keep the RPO values small.
  • Configure connectivity to Google Cloud using Cloud Interconnect & Cloud VPN
  • Create a custom image of the database server on Google Cloud with exactly the same configuration as the one on-premises.
  • Start a minimally sized instance from the custom image and attach any persistent disks needed.
  • Set auto delete flag to no-auto-delete so that our Persistent disk will not be inadvertently deleted, since that would be a disaster
Database backup and recovery: Steps to be taken “before” disaster

Steps to be taken “during” disaster

When time comes to recover the database to DR site on Google Cloud, it’s easy for Mane-street-Style:

  • Apply the latest backup file and transaction logs that were copied to Cloud Storage.
  • Replace the minimal instance with a larger instance that is capable of accepting production traffic.
  • And finally, switch clients to point at the recovered database in Google Cloud.
Database backup and recovery: Steps to be taken “during” disaster

Steps to be taken “after” disaster

When production environment on premise is up and running, they would just have to just reverse the steps:

  • Take a backup of the database and transactions logs running on Google Cloud.
  • Copy these backup files to your production environment.
  • Apply them to the production database system.
  • Prevent clients from connecting to the database system in Google Cloud, this can be done by stopping the database system service. From this point on, the application will be unavailable until restoring to the production environment is done.
  • Finally, redirect client connections to the production environment and thats it!
Database backup and recovery: Steps to be taken “after” disaster

Standby server approach:

An alternative recipe is to set up a standby server on Google Cloud for data replication, which, helps achieve very small RTO and RPO values since it actually replicates data and database state in real time to a hot standby of the database server. If Mane-street-style was to set up a standby server then here is how they would do it:

  • First connect the on-premises network and the Google Cloud network.
  • Then create a custom image of the database server on Google Cloud with the same configuration as the one on-premises.
  • Start an instance from the custom image and attach any persistent disks that are needed with auto-delete flag set to false.
  • Then configure replication between the on-premises database server and the target database server in Google Cloud.
  • Clients are configured in normal operation to point to the database server on-premises.
Standby server approach: Steps to be taken “before” disaster
  • After configuring this replication topology, they would have to switch clients to point to the standby server running in the Google Cloud network.
  • When production DB on premise is up and running, they just resynchronize the production database server with the Google Cloud database server and then switch clients to point back to the production environment.
Standby server approach: Steps to be taken “after” disaster

What if the production is in another cloud?

In the case of Mane-Street-Style, production is set up on -premises. But if they had production set up in AWS they can use the storage transfer service within GCS to transfer objects from Amazon S3 to Cloud Storage.

They could set up a transfer job to schedule periodic synchronization from data source to the sink with advanced filters based on file creation dates, filename filters, and the times of day they prefer to import data. They can also use tools like Apache Airflow to move data between clouds

Conclusion

If you have a production application on-premise or in another cloud and need to set up Data recovery, then hopefully you learned some strategies to do that! Stay tuned for the next article, where you will learn to set up more DR for data on Google Cloud.

Next steps

--

--

Priyanka Vergadia
Google Cloud - Community

Developer Advocate @Google, Artist & Traveler! Twitter @pvergadia