Implementing Automated VM Tagging and Protection with GCP Backup & DR

Uri Arriaga
google-cloud-hispanoamerica
8 min readJul 15, 2024

Author's: Ejglopez & Uri Arriaga

Overview:

Manual configuration could be the right option for a small amount of Compute Instances, however this could become a time-consuming and error-proning activity as the organization grows. These could lead to inconsistent protection and limited visibility and control over your backup strategy.

There is a possible solution to the manual, individual VM backup configuration, which is the creation of Logical groups which are used to apply a common backup policy to all the group’s applications or host. However, this approach still requires the manual configuration of the logical groups and the manual assignments of applications to those groups.

Dynamic tagging for Google cloud Backup & DR addresses the inconsistencies of manual backup configurations by automating backup configuration based on predefined tags, simplifying management, ensuring consistent protection for all qualifying Compute Instances, and enabling granular control for different types of Compute Instances or applications. This approach dramatically reduces the risk of human error and data loss, while improving the overall efficiency and reliability of your backup strategy.

This document expands upon the automated VM tagging workflow to seamlessly incorporate Google Cloud’s Backup and Disaster Recovery (Backup & DR) solution. This integration enables the automatic protection of Compute Instances based on the tags applied by our workflow.

Key Benefits:

  • Simplified Management: Unified process for tagging and protecting Compute Instances. Reduce the administrative overhead of configuring VM backups at scale by updating the policy attached to the tag rather than at an individual VM backup level.
  • Consistent Protection: Ensure that New Compute Instances with matching tags are automatically protected.
  • Granular Control: Flexible tag-based policies for different backup needs.
  • Increased flexibility: Create different backup policies for different types of Compute Instances or applications by using different tags.

Prerequisite to automate backups:

There are certain prerequisites that needs to be in place before implementing automated backups with tagging:

Roles and permissions required:

  • To create, update, and delete Tags definitions for Compute Engine resources you need Tag Administrator Role (roles/resourcemanager.tagAdmin)
  • To Add and remove tags that are attached to the resources, you need Tag User role (roles/resourcemanager.tagUser)

To create, update, and delete Dynamic Protection Tags inside Backup and Dr you need to have one of the following roles:

  • Backup and Dr Admin (roles/backupdr.admin)
  • Backup and Dr Backup User (roles/backupdr.backupUser)
  • Backup and DR User V2 (roles/backupdr.userv2)

Create a custom role with the following permissions

  • backupdr.managementServers.listDynamicProtection
  • backupdr.managementServers.getDynamicProtection
  • backupdr.managementServers.createDynamicProtection
  • backupdr.managementServers.deleteDynamicProtection
  • compute.instances.listEffectiveTags
  • Serviceusage.services.enable to enable required APIs.
  • Note: these permissions are required to assign the tags and configure the effective Backup templates.
  • To deploy the cloud function, Pub/Sub and Asset inventory workflow you will need the following permissions:
  • roles/cloudasset.viewer: This role is necessary to view assets in Cloud Asset Inventory. It allows the system to identify when new Compute Instances are created.
  • roles/pubsub.editor: This role gives you full control over Pub/Sub topics and subscriptions. You’ll need it to create the topic and subscription that will trigger your Cloud Function.
  • roles/cloudfunctions.developer: This role is essential to deploy and manage your Cloud Function.
  • roles/cloudfunctions.invoker: This role is needed to allow the Pub/Sub topic to trigger your Cloud Function.
  • roles/compute.instanceAdmin.v1: This role gives you the ability to modify Compute Instances, create Compute Instances, and run the tests.
  • This role can be removed after the testing phase, and replaced by roles/compute.viewer to get the information from the Instances.
  • Define your tagging strategy, this might reflect criticality levels or services names or environments.

Architecture:

  1. GCE VM: The provisioned VM instance.
  2. Asset Inventory: Monitors resource changes and emits events.
  3. Pub/Sub Topic: Central message broker for VM creation events.
  4. Cloud Function: Serverless function triggered by Pub/Sub, responsible for:
  • Adding tags to new Compute Instances
  • Evaluating tags to determine Backup & DR policy applicability
  1. GCE API: Used by the Cloud Function to update VM tags.
  2. Backup & DR Policy: Pre-configured policy defining backup schedules, retention, etc.
  3. Backup and Dr Backup Appliance: This backup appliance could be on the same Region or different region than the GCE Compute Instances. The appliance could also be on a different VPC as long as this VPC has a connection with the VPC where you will create the GCE Instances to be tagged.

Implementation Steps:

1. Set Global Variable & Activate APIs

  • Env Variables: Create the Variables we will be using across several commands
export PROJECT_ID=project_ej
export TOPIC=vm-creation-events
export REGION=us-central1
  • Enable APIs
gcloud services enable cloudasset.googleapis.com
gcloud services enable pubsub.googleapis.com
gcloud services enable cloudfunctions.googleapis.com
gcloud services enable compute.googleapis.com
gcloud services enable eventarc.googleapis.com

2. Pub/Sub Topic Setup:

  • Topic Creation: Create the topic (e.g., vm-creation-events) if it doesn’t exist.
gcloud pubsub topics create $TOPIC

3. Asset Inventory Configuration:

Asset Inventory detect when a new Instance is created and generates an event that will trigger the complete process

  • Feed Creation: Point the feed to the designated Pub/Sub topic (e.g., vm-creation-events).
gcloud asset feeds create vm-creation-events \
--project=$PROJECT_ID \
--pubsub-topic="projects/$PROJECT_ID/topics/$TOPIC" \
--content-type=resource \
--asset-types=compute.googleapis.com/Instance \
--condition-expression="temporal_asset.prior_asset_state == google.cloud.asset.v1.TemporalAsset.PriorAssetState.DOES_NOT_EXIST"

You can use the following command to describe the feed created

gcloud asset feeds describe vm-creation-events --project=$PROJECT_ID

4. Cloud Function Deployment:

  • Copy this code to file called main.py

This code perform the following tasks:

  1. create_tag_binding function:
  • Takes parameters for instance ID, zone, project ID, and the tag value.
  • Creates a Tag Binding object, associating it with the given instance.
  • Builds a request to create this tag binding.
  • Executes the request and prints the result of the tagging operation.
  1. subscribe function:
  • Decorated as a Cloud Function entry point, triggered by Cloud Pub/Sub events.
  • Receives a CloudEvent, in this case: Instance creation
  • Extracts relevant details from the event’s data (instance ID, name, zone, project).
  • Checks if the instance name suggests it’s part of a GKE cluster.
  • If so, call create_tag_binding to tag the instance with a specific value (tagValues/281480423597582).
  • In case you want to tag different types of machines you could add if statements using names as differentiator.

Note: tagValues are different for each organization so make sure that you copy the correct value in the code.

from cloudevents.http import CloudEvent
from google.cloud import resourcemanager_v3
from google.api_core.client_options import ClientOptions
import base64
import json
import os
import functions_framework

def create_tag_binding(instanceId,zone,projectId,tagValue):

### Create a client
client = resourcemanager_v3.TagValuesClient()

### Set the regional endpoint
regional_endpoint = f"{zone}-cloudresourcemanager.googleapis.com"
print(regional_endpoint)
client_options = ClientOptions(api_endpoint=regional_endpoint)

### Create a client with the specified regional endpoint
client = resourcemanager_v3.TagBindingsClient(client_options=client_options)
tag_binding = resourcemanager_v3.TagBinding()
tag_binding.parent = f"//compute.googleapis.com/projects/{projectId}/zones/{zone}/instances/{instanceId}"
tag_binding.tag_value = tagValue

### Initialize request argument(s)
request = resourcemanager_v3.CreateTagBindingRequest(tag_binding=tag_binding)

### Make the request
operation = client.create_tag_binding(request=request)

### Handle the response
print(operation.result())

### Triggered from a message on a Cloud Pub/Sub topic.
@functions_framework.cloud_event
def subscribe(cloud_event: CloudEvent) -> None:

### Parsing data from Pub/Sub message
data = base64.b64decode(cloud_event.data["message"]["data"]).decode()
instaceNum = json.loads(data)["asset"]["resource"]["data"]["id"]
name = json.loads(data)["asset"]["name"].split("/")[-1]
zone = json.loads(data)["asset"]["name"].split("/")[-3]
projectId = json.loads(data)["asset"]["name"].split("/")[-5]

### Logging parsed data
print("Projectid=" + projectId + ", Zone=" + zone + ", InstanceNum="+ instaceNum)

### Logic to discriminate newly created VMs by a patron in the name
if "gke-" not in name:
create_tag_binding(instaceNum,zone,projectId,"tagValues/281480423597582")

Note: tag should be defined prior to deploying this cloud function. In the following URL there is the process to Create Dynamic Protection Tags.

Once you have created a tag key, you can then add accepted values for the key. The process to assign tag values is the following.

To retrieve the Tag Value ID execute, you could do it through the console or executing the following cli commands:

export PROJECT_NAME=schedule-ejglopez
export TAGVALUE_SHORT_NAME=general
export TAGKEY_NAME=$(gcloud resource-manager tags keys list - parent=projects/$PROJECT_NAME - filter='SHORT_NAME=backupdr-dynamicprotect' - format="value(name)")
gcloud resource-manager tags values list - parent=$TAGKEY_NAME - filter=SHORT_NAME=$TAGVALUE_SHORT_NAME - format="value(name)"

IMPORTANT: Replace the tag value, you obtained in the previous step, in the last line of the cloud function code or main.py file

create_tag_binding(instaceNum,zone,projectId,”tagValues/281480423597582")
  • Create a requirements.txt file and add the following line in it
functions-framework==3.5.0
google-cloud-resource-manager
  • Create the function using code below and set the trigger to the Pub/Sub topic (vm-creation-events).
gcloud functions deploy python-pubsub-function \
--gen2 \
--runtime=python312 \
--region=$REGION \
--source=. \
--entry-point=subscribe \
--trigger-topic=$TOPIC
  • Assign an appropriate service account for the cloud function with the following Roles:
  • Tag User
  • Compute Admin

Note: if the following message appears “API [eventarc.googleapis.com] not enabled on project [schedule-xxxx]. Would you like to enable and retry (this will take a few minutes)? (y/N)?” select y

Wait for the deployment completion and review the functionality.

5. Backup & DR Configuration:

Policy Creation: Create Backup & DR template with desired schedules and targets.

Create a unique profile

Here are the steps to create Dynamic Protection Tag Values for your Compute Engine instances:

  • In the Google Cloud Backup and DR Console, go to the Backup Plans page.
  • Click on the “Dynamic Protection Tags” tab.
  • Click on the “Create Dynamic Protection Tag” button at the upper right corner.
  • Enter a unique Tag Value that should exist, for this case we are using “general”, the name should be related to the tagValue used in the cloud function code.
  • Choose a corresponding Template and Profile to be associated with this Tag Value.
  • Click on the “Save” button. A Dynamic Protection Tag value is created.
  • Dynamic Protection (Optional): Enable dynamic protection based on VM tags, ensuring that Compute Instances with specific tags are automatically included in the policy.

6. Testing the Deployment:

In order to test the configuration, create a VM instance in the corresponding VPC. The Name should start with “gke-”

After the instance is created you should see the Tags assigned:

Additionally you could see the execution logs from the cloud function:

Make sure the TagValues inside the textPayload corresponds to the desired tag.

Troubleshooting:

--

--