Google Cloud Storage “exploder”

Proof-Of-Concept Only — Use at your own risk

Daz Wilkin
Google Cloud - Community
5 min readNov 17, 2017

--

A colleague asked whether Cloud Functions could provide unzip functionality to Cloud Storage. Another colleague pointed me to a solution — similar in principle — for Firebase Functions that provides image transformation; he also suggested “exploder” as a more evocative title ;-)

My interest was piqued and I developed the following as a proof-of-concept (it works) but I urge you to not use this in production.

It’s an excellent and quite common request from customers that Google Cloud Storage (GCS) should include common file processing tools. The goal would be that these transformations happen service-side to save the complexity of downloading an object from GCS, processing it and returning the result(s) to GCS. For those that are not familiar with GCS, it’s a service in Google Cloud Platform (GCP) that provides effectively limitless “object” aka “BLOB” aka “unstructured” file storage.

You may get started for free with Google Cloud Platform but I suspect, if you’re reading this, you’re likely already a beloved GCP customer and may be seeking a template solution for this problem. Read on, dear customer.

Setup

You’ll need a GCP project, at least 2 GCS buckets, and Cloud Functions enabled:

ROOT=$(whoami)-$(date +%y%m%d)
BILLING=[[YOUR-BILLING-ID]]
PROJECT=[[YOUR-PROJECT-ID]] // ${ROOT}-gcs-exploder
REGION=[[YOUR-PREFERED-REGION]]
gcloud alpha projects create $PROJECTgcloud alpha billing projects link $PROJECT --account-id=$BILLINGgcloud services enable cloudfunctions.googleapis.com \
--project=$PROJECT
for BUCKET in receive explode staging
do
gsutil mb \
-c regional \
-l ${REGION} \
-p ${PROJECT} \
gs://${ROOT}-${BUCKET}
done

Cloud Functions

Customarily, I’m all command-line but I find the Console experience for Cloud Functions to be mostly excellent and I find myself preferring it. Let’s do this both ways, starting with the Console:

https://console.cloud.google.com/functions/list?project=${PROJECT}
Cloud Functions: Create Function

Click “Create Function” and

Cloud Functions: Editor
  • You may choose your own “Name”
  • You may bump the “Memory allocated” (but may not need more than 256MB)
  • For “Trigger” select “Cloud Storage bucket”
  • Browse and select the bucket you intend to receive the zipped file. If you’re following along with all these instructions, it will be called ${ROOT}-receive.

For convenience, I’m going to use the inline editor.

Replace index.js with:

NB Replace [[YOUR-ROOT]] in lines 13+14 with the value of ${ROOT}. Or whatever names you used to create the buckets.

and replace package.json with:

Cloud Functions: function-1
  • Select “Stage bucket”. If you’re following along with all these instructions, it will be called ${ROOT}-staging.
  • Because I renamed the exported function to processZip please ensure you change the “Function to execute” property to this value.
  • Under “Advanced options”, you may wish to increase the “Timeout” value from the default “60” (seconds). Function invocations are terminated if not completed by this Timeout value. For large zip files, 60 seconds may be insufficient time to explode them.

Click “Create” to deploy the Cloud Function to GCP.

Cloud Functions: deploying…

All being well:

Success!

Testing

Our Cloud Function should be triggered on *all* object changes to our GCS “receive” bucket. You may read more about so-called Object Change Notification (OCN) here. In practice, our Function should be more prudent in filtering the OCNs for those that are intended for our exploder.

I recommend you file a reasonably small zip file (${TESTZIP}) to test.

gsutil cp /path/to/${TESTZIP} gs://${ROOT}-receive

and, to confirm you can run the following command although the cp command should provide sufficient confirmation of success or failure.

gsutil ls gs://${ROOT}-receive

Browsing GCS buckets and objects is facilitated with the Console’s Browser:

https://console.cloud.google.com/storage/browser?project=${PROJECT}

which should show something similar to:

If you drill into the “receive” bucket, you should see your uploaded ${TESTZIP} file:

Browser: uploaded zip(s)

And, if you wait a few seconds and navigate to the “explode” bucket, you should see:

Browser: “folder”

GCS stores objects in a flat namespace (one per bucket). As a convenience, GCS presents objects that contain / in their object name as if these formed a conventional directory hierarchy. In the code, you will see that I prefixed unzipped files with a Linux epoch value (in this case 1510952315489) and a /. The Browser presents the bucket’s contents as if there were a directory called 1510952315489 (your value will differ but it will be unique and corresponds to the epoch value) but the implementation is that all these image files image-1.jpg, image-2.jpg are actually named: 1510952315489/image-1.jpg, 1510952315489/image-2.jpg

Browser: unzipped files

That worked…. my test.zip in the “receive” bucket is now exploded in the “explode” bucket.

And, you should see our console.log output in the Cloud Logging logs:

Cloud Logging

and you can enumerate the bucket using the gsutil CLI:

gsutil ls -r gs://${ROOT}-explode
gs://dazwilkin-171117-explode/1510952315489/:
gs://dazwilkin-171117-explode/1510952315489/image-1.jpg
gs://dazwilkin-171117-explode/1510952315489/image-2.jpg
gs://dazwilkin-171117-explode/1510952315489/image-3.jpg
gs://dazwilkin-171117-explode/1510952315489/image-3.jpg
gs://dazwilkin-171117-explode/1510952315489/image-4.jpg
gs://dazwilkin-171117-explode/1510952315489/image-5.jpg
gs://dazwilkin-171117-explode/1510952315489/image-6.jpg
gs://dazwilkin-171117-explode/1510952315489/image-7.jpg

Command-line

Cloud Functions is fully supported by the Cloud SDK (“gcloud”). Assuming you have index.js and package.json in the current directory. You do *not* need to npm install the packages. Just the two files:

gcloud beta functions deploy function-1 \
--entry-point=processZip \
--stage-bucket=${ROOT}-staging \
--trigger-bucket=${ROOT}-receive \
--project=${PROJECT}

Conclusion

Hopefully this will trigger some ideas around other uses of Cloud Functions. Once again, this is a proof-of-concept only and needs work before it would be usable for anything more.

Tidy-up

You can delete Cloud Functions individually:

gcloud beta functions delete function-1 \
--project=${PROJECT} \
--quiet

You may delete buckets after recursively delete all their objects — please be VERY careful using this command:

gsutil rm -r gs://${ROOT}-receive
gsutil rm -r gs://${ROOT}-explode

Alternatively you can simply delete the project which will delete everything within it too:

gcloud projects delete ${PROJECT} --quiet

Thanks!

--

--