Schedule Machine Images: Backup and DR solution for Google Compute VM Instances

Vishal Bulbule
Google Cloud - Community
5 min readApr 15, 2023

Introduction

Greetings everyone! Welcome back! I have received numerous inquiries regarding the most effective backup and disaster recovery solution for GCP VMs. In this article, I will present one of the top solutions for backing up and recovering GCP VMs. It’s worth noting that there is no one-size-fits-all solution that can cater to all needs, as it varies depending on the specific requirements. Additionally, for those who prefer video content, I have also created a YouTube video on this topic that you can check out.

Backup and Disaster Recovery (DR) are important for Google Cloud Platform (GCP) VMs to protect against data loss and ensure business continuity in the event of a disaster or outage.

While Google Cloud does not offer a single, comprehensive managed solution for backup and DR of Compute VMs, it provides a range of services that can be utilized to implement backup and DR strategies based on specific business needs.

Options Available in Google Cloud

  1. Snapshot —

A disk snapshot is a point-in-time copy of a persistent disk. Disk snapshots are incremental, meaning only the changes made since the last snapshot are captured in subsequent snapshots. Though it is easy to schedule snapshots, it is difficult when it comes to recovery. Just take the example below -

If you have 10 GCP VMs with 4 disks attached to each VM, and you have a snapshot policy attached to each disk that creates daily snapshots, recovering from a disaster can be a complex and time-consuming process. In order to fully recover your application, you would need to identify and recover the latest snapshot for each disk. However, even if you manage to recover all of the disks, you may still face challenges recovering Instance metadata such as network tags used for firewalls, labels used for billing, startup scripts, and other configuration information.

So definitely snapshots alone are not BEST Backup and Disaster Recovery solution for Compute VMs.

2. Machine Image

A machine image is a Compute Engine resource that stores all the configuration, metadata, permissions, and data from multiple disks of a virtual machine (VM) instance. You can use a machine image in many system maintenance, backup and recovery, and instance cloning scenarios.

But why we can't use Machine Image alone as a Backup and DR solution for Compute VMs?

Because in GCP, there is no scheduling feature for Machine images :(

3. Backup and DR service (New)

In my personal experience and opinion, I have found that this new service is not suitable or mature enough for use in production environments. Additionally, there are several challenges associated with this tool, such as inadequate documentation, high management overhead, and cost. You can still explore this service and suggest better solutions.

Then what is the solution now?

While there is no built-in, fully managed solution for scheduling machine images in Google Cloud, we can still achieve this by utilizing Cloud Scheduler and Cloud Functions to set up a schedule for creating the machine images.

Please find below the block diagram

Architecture for Backup GCP VM using Machine Image ,Cloud function and Scheduler

Steps -

Cloud Function Setup

  1. Use the below Python script to create machine images daily and delete machine images from the previous day to maintain a single latest copy. You can customize frequency based on your need.
import googleapiclient
import googleapiclient.discovery
from datetime import datetime , date
from datetime import timedelta
import time

def create_machineimage(event, context):
compute = googleapiclient.discovery.build('compute', 'v1')
project = '<project-id>' #replace with your project id
zone = 'us-central1-a'
now = datetime.now()
print(now)
today = now.strftime("%Y%m%d")
y = date.today() - timedelta(days=1)
yesterday = y.strftime('%Y%m%d')

instances = compute.instances().list(project=project,zone=zone).execute()
for instance in instances['items']:
print(instance['name'])
instance_name = instance['name']
instance_url = 'projects/{0}/zones/{1}/instances/{2}'.format(project,zone,instance_name)
body = {
'name': instance_name+today,
'source_instance': instance_url
}
operation = compute.machineImages().insert(project=project,body=body).execute()
print(f"Machine image creation started")
time.sleep(5)
try:
operation = compute.machineImages().delete(project=project, machineImage=instance_name+yesterday).execute()
print("Older Macchine Image deleted")
except:
print("No Older machine image to delete")

2. Create 1st gen — Pub/Sub triggered Cloud function with given details.

3. Use “create_machineimage” as an entry point for the cloud function as we are using this function name for our script.

requirements.txt

4. Make sure Cloud Function is deployed successfully.

Cloud Scheduler Setup

  1. Create Cloud Scheduler Job with the required schedule.
  2. Use the same Pub/Sub topic as the target, which is used for the Cloud function.

3. Monitor Cloud Scheduler Job and Cloud Function execution.

4. Once it is successfully set up, it should create machine images on daily basis by deleting older machine images.

Please note that if you have configured a custom service account for your Cloud Function or Cloud Scheduler, you may need to grant it the necessary permissions, such as the Compute Instance Admin V1 role.

Restoration

Restoring or creating an instance from a machine image is a simple process that can be automated through scripting based on specific DR requirements. With machine image restoration, a fully functional instance can be up and running in just a few minutes without the need for manual recreation of disks, configurations, and settings that were saved in the image.

Conclusion

Restoring a VM using a machine image is generally considered more convenient than restoring from a snapshot because it allows you to restore an entire VM, including the boot disk, system state, and any additional disks and configurations that were saved in the machine image.

Restoring a VM from a machine image is generally a more streamlined and convenient process than restoring from a snapshot, especially if you need to restore a VM with a complex configuration or multiple disks.

Please refer below youtube video for the complete implementation

About Me

As an experienced 8x certified Google Cloud Architect/Data Engineer with over 6 years of expertise in Google Cloud, Data Analytics, and BI, I am passionate about technology and innovation. Being a Champion Innovator and Google Cloud Architect, I am always exploring new ways to leverage cloud technologies to deliver innovative solutions that make a difference.

If you have any queries or would like to get in touch, you can reach me at my email address vishal.bulbule@techtrapture.com or connect with me on LinkedIn at https://www.linkedin.com/in/vishal-bulbule/. For a more personal connection, you can also find me on Instagram at https://www.instagram.com/vishal_bulbule/?hl=en.

Additionally, please check out my YouTube Channel at https://www.youtube.com/@techtrapture for tutorials and demos on Google Cloud.

--

--