Hot Disaster recovery on Google Cloud for applications running on-premises

Get Cooking in Cloud

Priyanka Vergadia

Published in

Google Cloud - Community

5 min readNov 12, 2019

Introduction

“Get Cooking in Cloud” is a blog and video series to help enterprises and developers build business solutions on Google Cloud. In this second miniseries I am covering Disaster Recovery on Google Cloud. Disasters can be pretty hard to deal with when you have an online presence. In this series of blogs, we will elaborate on how to deal with disasters like earthquakes, power outages, floods, fires etc. If you are interested in the prior mini series covered, checkout this.

Here is the plan for the series.

Disaster Recovery Overview
Cold Disaster recovery on Google Cloud for on-premise applications
Warm Disaster recovery on Google Cloud for on-premise applications
Hot Disaster recovery on Google Cloud for on-premise applications (This article)
Cold Disaster recovery for applications in Google Cloud
Warm Disaster recovery for applications in Google Cloud
Hot Disaster recovery for applications in Google Cloud
Disaster recovery on Google Cloud for Data: Part 1
Disaster recovery on Google Cloud for Data: Part 2

In this article, you will learn to set up a Hot DR pattern for your applications that are deployed on-premises. So, read on!

What you’ll learn

Hot DR pattern with an example
Steps to be taken before a disaster hits
Steps to be taken during a disaster
Steps to be taken after a disaster

Prerequisites

Basic concepts and constructs of Google Cloud so you can recognize the names of the products.
Read the overview article before continuing on.

Check out the video

Let’s learn Warm DR pattern with an example

In the last two articles we have talked about Mane-street Art that runs their applications on-premises and are building a DR infrastructure on Google Cloud. And we saw that they started with a Cold DR plan and moved to Warm standby due to a need for lower RTO and RPO values.

Now, Mane-street-art has become really popular and cannot afford to be down for even seconds. Since their requirement is to achieve a near-zero RTO and RPO values, the only way is by running HA architecture across their production environment and google cloud concurrently.

Note: If you are unfamiliar with the terms used here (RTO, RPO, DR Patterns) checkout the previous blog to get an overview.

In any DR pattern you need to understand what steps need to be taken before a disaster hits, what happens when a disaster hits and what needs to happen after the disaster has passed.

Hot DR Pattern — How does it work?

Steps to be taken before disaster hits

Create a VPC network
Configure the connectivity between the on-premises network and the Google Cloud network
Create custom images of the servers in google cloud with the exact same configuration as on-premise.
Configure replication between our on-premise database server and the one on Google cloud. Remember that if your database systems permit only a single writable database instance when you configure replication then you might need to ensure that one of the database replicas acts as a read-only server.
Create individual instance templates that use the images for the application servers and the web server.
Configure regional managed instance groups for the application and web servers.
Configure health checks using stackdriver monitoring.
Configure load balancing using the regional managed instance groups that we configured.
Configure a scheduled task to create regular snapshots of the persistent disks.
Lastly, Configure a DNS service to distribute traffic between your on-premises environment and the GCP environment.

With this hybrid approach, you need to use a DNS service that supports weighted routing to the two-production environments so that you can serve the same application from both.

Steps to be taken when disaster hits

In case of a failure on-premise, you just disable DNS routing to the on-premise web server and that’s it! In most cases DNS service supports health checks and will automatically route all the traffic to the healthy servers on Google Cloud.

Steps to be taken after the disaster has passed

When the production environment is running on-premises again and can support production workloads, Mane-street art has to do the following:

Resynchronize databases.
If the database system doesn’t automatically promote a read-only replica to be the writeable primary on failure, you need to intervene to ensure that the replica is promoted.
After ensuring that, just enable the DNS routing back to distribute traffic to both on-premise and Google cloud.

Conclusion

If you are running your application on premise and are looking to achieve very very small RTO and RPO values then hopefully you learned how to approach recovering the environment from failure using Google Cloud hot HA across the two environments. Stay tuned for upcoming articles, where you will learn to set up more DR patterns that make sense for your business.

Next steps

Follow this blog series on Google Cloud Platform Medium.
Reference the DR solutions.
Follow Get Cooking in Cloud video series and subscribe to Google cloud platform YouTube channel
Want more stories? Check my Medium, follow me on twitter.
Enjoy the ride with us through this miniseries and learn more about more such Google Cloud solutions :)