Hot Disaster recovery on Google Cloud for applications running on-premises
Get Cooking in Cloud
“Get Cooking in Cloud” is a blog and video series to help enterprises and developers build business solutions on Google Cloud. In this second miniseries I am covering Disaster Recovery on Google Cloud. Disasters can be pretty hard to deal with when you have an online presence. In this series of blogs, we will elaborate on how to deal with disasters like earthquakes, power outages, floods, fires etc. If you are interested in the prior mini series covered, checkout this.
Here is the plan for the series.
- Disaster Recovery Overview
- Cold Disaster recovery on Google Cloud for on-premise applications
- Warm Disaster recovery on Google Cloud for on-premise applications
- Hot Disaster recovery on Google Cloud for on-premise applications (This article)
- Cold Disaster recovery for applications in Google Cloud
- Warm Disaster recovery for applications in Google Cloud
- Hot Disaster recovery for applications in Google Cloud
- Disaster recovery on Google Cloud for Data: Part 1
- Disaster recovery on Google Cloud for Data: Part 2
In this article, you will learn to set up a Hot DR pattern for your applications that are deployed on-premises. So, read on!
What you’ll learn
- Hot DR pattern with an example
- Steps to be taken before a disaster hits
- Steps to be taken during a disaster
- Steps to be taken after a disaster
- Basic concepts and constructs of Google Cloud so you can recognize the names of the products.
- Read the overview article before continuing on.
Check out the video
Let’s learn Warm DR pattern with an example
In the last two articles we have talked about Mane-street Art that runs their applications on-premises and are building a DR infrastructure on Google Cloud. And we saw that they started with a Cold DR plan and moved to Warm standby due to a need for lower RTO and RPO values.
Now, Mane-street-art has become really popular and cannot afford to be down for even seconds. Since their requirement is to achieve a near-zero RTO and RPO values, the only way is by running HA architecture across their production environment and google cloud concurrently.
Note: If you are unfamiliar with the terms used here (RTO, RPO, DR Patterns) checkout the previous blog to get an overview.
In any DR pattern you need to understand what steps need to be taken before a disaster hits, what happens when a disaster hits and what needs to happen after the disaster has passed.
Hot DR Pattern — How does it work?
Steps to be taken before disaster hits
- Create a VPC network
- Configure the connectivity between the on-premises network and the Google Cloud network
- Create custom images of the servers in google cloud with the exact same configuration as on-premise.
- Configure replication between our on-premise database server and the one on Google cloud. Remember that if your database systems permit only a single writable database instance when you configure replication then you might need to ensure that one of the database replicas acts as a read-only server.
- Create individual instance templates that use the images for the application servers and the web server.
- Configure regional managed instance groups for the application and web servers.
- Configure health checks using stackdriver monitoring.
- Configure load balancing using the regional managed instance groups that we configured.
- Configure a scheduled task to create regular snapshots of the persistent disks.
- Lastly, Configure a DNS service to distribute traffic between your on-premises environment and the GCP environment.
With this hybrid approach, you need to use a DNS service that supports weighted routing to the two-production environments so that you can serve the same application from both.
Steps to be taken when disaster hits
In case of a failure on-premise, you just disable DNS routing to the on-premise web server and that’s it! In most cases DNS service supports health checks and will automatically route all the traffic to the healthy servers on Google Cloud.
Steps to be taken after the disaster has passed
When the production environment is running on-premises again and can support production workloads, Mane-street art has to do the following:
- Resynchronize databases.
- If the database system doesn’t automatically promote a read-only replica to be the writeable primary on failure, you need to intervene to ensure that the replica is promoted.
- After ensuring that, just enable the DNS routing back to distribute traffic to both on-premise and Google cloud.
If you are running your application on premise and are looking to achieve very very small RTO and RPO values then hopefully you learned how to approach recovering the environment from failure using Google Cloud hot HA across the two environments. Stay tuned for upcoming articles, where you will learn to set up more DR patterns that make sense for your business.
- Follow this blog series on Google Cloud Platform Medium.
- Reference the DR solutions.
- Follow Get Cooking in Cloud video series and subscribe to Google cloud platform YouTube channel
- Want more stories? Check my Medium, follow me on twitter.
- Enjoy the ride with us through this miniseries and learn more about more such Google Cloud solutions :)