Cold Disaster recovery for applications in Google Cloud

Get Cooking in Cloud

Priyanka Vergadia
Google Cloud - Community
5 min readNov 16, 2019

--

Introduction

“Get Cooking in Cloud” is a blog and video series to help enterprises and developers build business solutions on Google Cloud. In this second miniseries I am covering Disaster Recovery on Google Cloud. Disasters can be pretty hard to deal with when you have an online presence. In the next few blogs, we will elaborate on how to deal with disasters like earthquakes, power outages, floods, fires etc. If you are interested in the prior mini series covered, checkout this.

Here is the plan for the series.

  1. Disaster Recovery Overview
  2. Cold Disaster recovery on Google Cloud for on-premise applications
  3. Warm Disaster recovery on Google Cloud for on-premise applications
  4. Hot Disaster recovery on Google Cloud for on-premise applications
  5. Cold Disaster recovery for applications in Google Cloud (This article)
  6. Warm Disaster recovery for applications in Google Cloud
  7. Hot Disaster recovery for applications in Google Cloud
  8. Disaster recovery on Google Cloud for Data: Part 1
  9. Disaster recovery on Google Cloud for Data: Part 2

In this article, you will learn to set up a Cold DR pattern for your applications that are deployed in Google Cloud. So, read on!

What you’ll learn

  • Cold DR pattern for Google Cloud applications, with an example.
  • Steps to be taken before a disaster hits.
  • Steps to be taken during a disaster.
  • How does default High Availability (HA) works.
  • Steps to be taken after a disaster.

Prerequisites

  • Basic concepts and constructs of Google Cloud so you can recognize the names of the products.
  • Read the overview article for DR related definitions.

Check out the video

Cold DR pattern for application deployed on Google Cloud

Let’s learn Cold DR pattern with an example

Mane-street-Art has migrated to Google Cloud but they still need a DR plan. They would like to set up cold DR pattern with one recoverable application server. Based on your architecture, just one recoverable server may or may not work, but consider this as an example.

In any DR pattern you need to understand what steps need to be taken before a disaster hits, what happens when a disaster hits and what needs to happen after the disaster has passed.

Cold DR Pattern — How does it work?

Steps to be taken before disaster hits:

  • Create a VPC network
  • Create a custom image that’s configured with the application service. As part of the image.
  • Make sure a persistent disk is attached for data being processed.
  • Create a snapshot from the attached persistent disk.
  • Configure a startup script to create a persistent disk from the latest snapshot and to mount the disk.
  • Then Create an instance template from the image we just created
  • Using this instance template, configure a regional managed instance group with a target size of one.
  • Make sure the health checks are configured at the Managed Instance Groups (MIGs)
  • Configure internal load balancing using the regional managed instance group
  • Configure a scheduled task to create regular snapshots of the persistent disk.

Steps to be taken when disaster hits:

  • Mane-street-art does not need to initiate any failover steps, because they occur automatically. That is the best part of the default HA features available in Google cloud.

How does default HA work?

  • The load balancer ensures that even when a replacement instance is needed, the same IP address is used to front the application server.
  • The instance template and custom image ensure that the replacement instance is configured identically to the instance it is replacing.
  • Mane-street-art’s RPO will be determined by the last snapshot taken. The more often the snapshots are taken, the smaller the RPO value.
  • The regional managed instance group provides HA in depth. It provides mechanisms to react to failures at the application, instance, or zone level. You don’t have to manually intervene if any of those scenarios occur. Setting a target size of 1 ensures you only ever have one instance running.
  • Persistent disks are zonal, so snapshots are required in order to re-create disks in case of zonal failure. Although, Snapshots are also available across regions, which permits you to restore a disk to a different region as easily restoring it to the same region.
  • In the event of a zonal failure, the regional instance group launches a replacement instance in a different zone in the same region. A new persistent disk is created from the latest snapshot and attached to the new instance
  • You could use a regional persistent disk instead of Zonal, which will be great because you won’t have to use snapshots to restore the persistent disk for recovery. But, be mindful that this consumes twice as much storage, which needs to be budgeted.

Steps to be taken after the disaster has passed

  • Since Google Cloud platform provides the default HA features, as long as the initial environment is set up for HA, there is really not much to do during or after the disaster.

Conclusion

If your application is deployed on Google cloud and you are on a specific budget to meet those RTO and RPO values, then use the Cold DR pattern! Stay tuned for upcoming articles, where you will learn to set up more DR patterns that makes sense for your business.

Next steps

--

--

Priyanka Vergadia
Google Cloud - Community

Developer Advocate @Google, Artist & Traveler! Twitter @pvergadia