Secret Sauce — Google Cloud Functions For Autonomous Global Storage Synchronization

Dean Pratt
6 min readMar 31, 2022

Google Cloud Platform offers different autonomous solutions to help accomplish manageable business objectives. One of these being the features offered by Google Cloud Functions. Some customers I have had the privilege of discussing solutions with, have asked for an easy to manage, globally replicated storage solution. For example, when uploading a file to one “source” bucket, it then triggers a function to copy that newly uploaded file to any of their other specified buckets around the world. The same process for deleting the object out of all the buckets as well. In an effort to help provide an easy and efficient bridge, I scripted (with help lol) some Google Cloud Storage Functions in Python for a pragmatic approach to a scalable solution for any business vertical.

The code below describes replication in the same project between a source bucket and single or multiple buckets as well as a deletion policy between single and multiple buckets. If there are storage workflow policies you wish to implement such as a central sink for all buckets, this code can give you a place to start too. You can create and trigger functions from Cloud Storage events easily inside the GCP console here:

For replication chose the Finalize/Create option and for Delete well…Delete lol. Next, select the storage bucket that will trigger the function when the selected trigger event takes place.

After selecting your runtime, build, connection and security settings you will have the opportunity to enter your code. I used Python 3.9 for this example and set the entry point at “replication”.

::: Tech It Out ::: #1

Here is the code for single bucket to bucket replication. Make sure to change the name of the destination variable to the one you would like to use inside the quotation marks and don’t forget to paste the requirements.txt code as well!

main.py code:

from google.cloud import storage

destination_bucket = "put_your_destination_bucket_name_here"

def replication(event, context):
"""Triggered by a change to a Cloud Storage bucket.
Args:
event (dict): Event payload.
context (google.cloud.functions.Context): Metadata for the event.
"""
file = event
print(f"Processing file: {file['name']}")
#print(file)

try:
# Initiate Cloud Storage client
storage_client = storage.Client()

# Define the origin bucket
origin = storage_client.bucket(file['bucket'])

# Define a blob object from the origin
blob = origin.get_blob(file['name'])

# Define the destination bucket
destination = storage_client.bucket(destination_bucket)

# Copy the file
origin.copy_blob(blob, destination)

return "Done!"

except:
return "Failed!"

requirements.txt code:

# Function dependencies, for example:
# package>=version
google-cloud-storage

::: Tech It Out ::: #2

Here is the following deletion policy for the same single bucket to bucket scenario.

main.py:

from google.cloud import storage

destination_bucket = "put_your_destination_bucket_name_here"

def deletion(event, context):
"""Triggered by a change to a Cloud Storage bucket.
Args:
event (dict): Event payload.
context (google.cloud.functions.Context): Metadata for the event.
"""
file = event
print(f"Processing file: {file['name']}")
#print(file)
try:
# Initiate Cloud Storage client
storage_client = storage.Client()

# Define the origin bucket
origin = storage_client.bucket(file['bucket'])

# Define a blob object from the origin
blob = origin.get_blob(file['name'])

# Define the destination bucket
destination_bucket = storage_client.bucket(destination_bucket)
destination_blob = destination_bucket.blob(file['name'])

# Delete the file in destination bucket
blob_delete = destination_bucket.delete_blob(destination_blob.name)

return "Done!"
except:
return "Failed!"

requirements.txt:

# Function dependencies, for example:
# package>=version
google-cloud-storage

::: Tech It Out ::: #3

In order to get an object copied to multiple buckets, you will need this code:

main.py:

from google.cloud import storage

destination_bucket1 = "put_your_destination_bucket1_name_here"
destination_bucket2 = "put_your_destination_bucket2_name_here"
destination_bucket3 = "put_your_destination_bucket3_name_here"

def replication(event, context):
"""Triggered by a change to a Cloud Storage bucket.
Args:
event (dict): Event payload.
context (google.cloud.functions.Context): Metadata for the event.
"""
file = event
print(f"Processing file: {file['name']}")
#print(file)
try:
# Initiate Cloud Storage client
storage_client = storage.Client()

# Define the origin bucket
origin = storage_client.bucket(file['bucket'])

# Define a blob object from the origin
blob = origin.get_blob(file['name'])

# Define the first destination bucket
destination1 = storage_client.bucket(destination_bucket1)

# Copy the file to destination bucket 1
origin.copy_blob(blob, destination1)

# Define the second destination bucket
destination2 = storage_client.bucket(destination_bucket2)

# Copy the file to destination bucket 2
origin.copy_blob(blob, destination2)

# Define the third destination bucket
destination3 = storage_client.bucket(destination_bucket3)

# Copy the file to destination bucket 3
origin.copy_blob(blob, destination3)

return "Done!"
except:
return "Failed!"

requirements.txt:

# Function dependencies, for example:
# package>=version
google-cloud-storage

::: Tech It Out ::: #4

And here is the script for deleting an object from multiple buckets after it has been deleted from the source bucket:

main.py:

from google.cloud import storage

destination_bucket1 = "put_your_destination1_bucket_name_here"
destination_bucket2 = "put_your_destination2_bucket_name_here"
destination_bucket3 = "put_your_destination3_bucket_name_here"

def deletion(event, context):
"""Triggered by a change to a Cloud Storage bucket.
Args:
event (dict): Event payload.
context (google.cloud.functions.Context): Metadata for the event.
"""
file = event
print(f"Processing file: {file['name']}")
#print(file)
try:
# Initiate Cloud Storage client
storage_client = storage.Client()

# Define the origin bucket
origin = storage_client.bucket(file['bucket'])

# Define a blob object from the origin
blob = origin.get_blob(file['name'])

# Define the first destination bucket
destination_bucket = storage_client.bucket(destination_bucket1)
destination_blob = destination_bucket.blob(file['name'])

# Delete the file in destination bucket 1
blob_delete = destination_bucket.delete_blob(destination_blob.name)

# Define the second destination bucket
destination_bucket = storage_client.bucket(destination_bucket2)
destination_blob = destination_bucket.blob(file['name'])

# Delete the file in destination bucket 2
blob_delete = destination_bucket.delete_blob(destination_blob.name)

# Define the third destination bucket
destination_bucket = storage_client.bucket(destination_bucket3)
destination_blob = destination_bucket.blob(file['name'])

# Delete the file in destination bucket 3
blob_delete = destination_bucket.delete_blob(destination_blob.name)

return "Done!"
except:
return "Failed!"

requirements.txt:

# Function dependencies, for example:
# package>=version
google-cloud-storage

::: Tech It Out ::: #5

If you run into issues, be sure to check your logs inside of the failed function and make sure you have granted the pubsub.publisher role to the Cloud Storage service account:

  1. Get the email address of the service agent associated with the project that contains your Cloud Storage bucket.
  2. Use the email address that you obtained in the previous step to give the service agent the IAM role pubsub.publisher for the relevant Pub/Sub topic.
SERVICE_ACCOUNT=$(gsutil kms serviceaccount -p $PROJECT_NUMBER)gcloud projects add-iam-policy-binding $PROJECT_ID \
--member serviceAccount:$SERVICE_ACCOUNT \
--role roles/pubsub.publisher
(logs from a Google Cloud Function)

In a very short amount of time your functions could look something like the picture below leaving you many options to build from by changing your source and destination buckets. Once again, the buckets must be in the same project for the above examples.

I hope this article provides a good reference and/or inspiration for creative cloud solutions! Here is some helpful additional reading as well:

Happy coding and until next time!!

-Dean

#techitout #GCP #google #storage #functions #cloud #data #monitoring #GCS #technology #automation #logs #infrastructure #management #bucket #object #lifecycle #policies #GCF #replication #copy #delete

Google / Google Cloud

--

--