Use Cases and Different Ways to get Files Into Google Cloud Storage

Mike Kahn
Google Cloud - Community
6 min readMay 31, 2017

Including AppEngine with Firebase Resumable Uploads

For this article I will break down down a few different ways to interact with Google Cloud Storage (GCS). The GCP docs state the following ways to upload your data: via the UI, via the gsutil CLI tool, or via JSON API in various languages. I’ll go over a few specific use cases and approaches for the ingress options to GCS below.

1. upload form on Google App Engine (GAE) using the JSON api
Use case: public upload portal (small files)
2. upload form with firebase on GAE using the JSON api
Use case: public upload portal (large files), uploads within mobile app
3. gsutil command line integrated with scripts or schedulers like cron
Use case: backups/archive, integration with scripts, migrations
4. S3 / GCP compatible file management programs such as Cyberduck
Use case: cloud storage management via desktop, migrations
5. Cloud function (GCF)
Use case: Integration, changes in buckets, HTTP requests
6. Cloud console
Use case: cloud storage management via desktop, migrations

1. App Engine nodejs with JSON API for smaller files

You can launch a small nodejs app on GAE for accepting smaller files directly to GCS ~20MB pretty easily. I started with the nodejs GCS sample for GAE on the GCP github account here.

This is a nice solution for integrating uploads around 20MB. Just remember the nginx servers behind GAE have a file upload limit. So if you try and upload something say around 50MB, you’ll receive an nginx error: ☹️

Nginx file upload error

You can try and upload the file size limit in the js file but still the web servers behind GAE will have a limit for file uploads. So, if you plan to create an upload form on App Engine, be sure to have a file size limitation in your UI.

Nodejs upload form for small files — I’ll likely take this app down at some point.

2. App Engine nodejs Firebase with JSON API and Resumable uploads for large files

Since the previous example only works for smaller files, I wondered how can we solve for uploading larger files say 100MB or 1GB? I started with the nodejs app engine storage example here.

After attempting to use resumable uploads in GCS API with TUS and failing I enlisted help from my friend Nathan @ www.incline.digital to help with another approach.

With the help of Nathan we integrated resumable uploads with firebase SDK. Code can be found here
https://github.com/mkahn5/gcloud-resumable-uploads.

Reference: https://firebase.google.com/docs/storage/web/upload-files

User interaction with the Firebase powered GAE Upload Form

While not very elegant with no status bar or anything fancy this solution does work for uploading large files from the web. 🙌🏻

Resumable File upload form on GAE — I’ll likely take this app down at some point.
GCS in the Firebase UI

3. gsutil from local or remote

gsutil makes it easy to copy files to and from cloud storage buckets

Just make sure you have the google cloud sdk on your workstation or remote server (https://cloud.google.com/sdk/downloads), set project and authenticate and thats it.

mkahnucf@meanstack-3-vm:~$ gsutil ls
gs://artifacts.testing-31337.appspot.com/
gs://staging.testing-31337.appspot.com/
gs://testing-31337-public/
gs://testing-31337.appspot.com/
gs://us.artifacts.testing-31337.appspot.com/
gs://vm-config.testing-31337.appspot.com/
gs://vm-containers.testing-31337.appspot.com/
mkahnucf@meanstack-3-vm:~/nodejs-docs-samples/appengine/storage$ gsutil cp app.js gs://testing-31337-publicCopying file://app.js [Content-Type=application/javascript].../ [1 files][ 2.7 KiB/ 2.7 KiB]Operation completed over 1 objects/2.7 KiB.

More details here.

gsutil makes it just easy to automate backup of directories, sync changes in directories, backup database dumps, and easily integrate with apps or schedulers for scripted file uploads to GCS.

Below is the rsync cron I have for my cloud storage bucket and the html files on my blog. This way I have consistency between my GCS bucket and my GCE instances if I decide to upload a file via www or via GCS UI.

Using gsutil on GCP to backup and sync GCS files
root@mkahncom-instance-group-multizone-kr5q:~# crontab -l*/2 * * * * gsutil rsync -r /var/www/html gs://mkahnarchive/mkahncombackup*/2 * * * * gsutil rsync -r gs://mkahnarchive/mkahncombackup /var/www/html

4. Cyberduck (MacOS) or any application with an s3 interface

Enjoy an client ftp type experience with Cyberduck on MacOS for GCS.

Cyberduck has very nice oauth integration for connecting to the GCS API built into the interface.

After authenticating with oauth you can browse all of your buckets and upload to them via the cyberduck app. Nice option to have for moving many directories or folders into multiple buckets.

More info on CyberDuck here.

5. Cloud Function

You can also configure a Google Cloud Function (GCF) to upload files to GCS from a remote or local location. This tutorial below is just for uploading files in a directory to GCS. Run the cloud function and it zips a local directory files and puts the zip into the GCS stage bucket.

Try the tutorial:
https://cloud.google.com/functions/docs/tutorials/storage

Michaels-iMac:gcf_gcs mkahnimac$ gcloud beta functions deploy helloGCS -stage-bucket mike-kahn-functions -trigger-bucket mikekahn-public-upload
Copying file:///var/folders/kq/5kq2pt090nx3ghp667nwygz80000gn/T/tmp6PXJmJ/fun.zip [Content-Type=application/zip]…
- [1 files][ 634.0 B/ 634.0 B]
Operation completed over 1 objects/634.0 B.
Deploying function (may take a while — up to 2 minutes)…done.
availableMemoryMb: 256
entryPoint: helloGCS
eventTrigger:
eventType: providers/cloud.storage/eventTypes/object.change
resource: projects/_/buckets/mikekahn-public-upload
latestOperation: operations/bWlrZS1rYWhuLXBlcnNvbmFsL3VzLWNlbnRyYWwxL2hlbGxvR0NTL1VFNmhlY1RZQV9j
name: projects/mike-kahn-personal/locations/us-central1/functions/helloGCS
serviceAccount: mike-kahn-personal@appspot.gserviceaccount.com
sourceArchiveUrl: gs://mike-kahn-functions/us-central1-helloGCS-wghzlmkeemix.zip
status: READY
timeout: 60s
updateTime: ‘2017–05–31T03:08:05Z’

You can also use cloud functions created to display bucket logs. Below shows a file uploaded via my public upload form and deleted via the console ui. This could be handy for pub/sub notifications or for reporting.

Michaels-iMac:gcf_gcs mkahnimac$ gcloud beta functions logs read helloGCSLEVEL  NAME      EXECUTION_ID     TIME_UTC                 LOGD      helloGCS  127516914299587  2017-05-31 03:46:19.412  Function execution started
I helloGCS 127516914299587 2017-05-31 03:46:19.502 File FLIGHTS BANGKOK.xlsx metadata updated.
D helloGCS 127516914299587 2017-05-31 03:46:19.523 Function execution took 113 ms, finished with status: 'ok'
D helloGCS 127581619801475 2017-05-31 18:31:00.156 Function execution started
I helloGCS 127581619801475 2017-05-31 18:31:00.379 File FLIGHTS BANGKOK.xlsx deleted.
D helloGCS 127581619801475 2017-05-31 18:31:00.478 Function execution took 323 ms, finished with status: 'ok'

Cloud Functions can come in handy for background tasks like regular maintenance from events on your GCP infrastructure or from activity on HTTP applications. Check out the how-to guides for writing and deploying cloud functions here.

6. Cloud Console UI

The UI works well for GCS administration. GCP even has a transfer service for files on S3 buckets on AWS or other s3 buckets elsewhere. One thing that is lacking in the portal currently would be object lifecycle management. This is nice for automated archiving to coldline cheaper object storage for infrequently accessed files or files over a certain age in buckets. For now you can only modify object lifecycle via gsutil or via API. Like most GCP features they start at the function/API level then make their way into that portal (the way it should be IMO) and I’m fine with that. I expect object lifecycle rules to be implemented into the GCP portal at some point in the future. 😃

GCS UI

In summary I’ve used a few GCP samples and tutorials that are available to display to different ways to get files onto GCS. GCS is flexible with many ingress options that can be integrated into systems or applications quite easily! In 2017 the use cases for object storage are abundant and GCP makes it easy to send and receive files in GCS.

Leave a comment for any interesting use cases for GCS that I may have missed or that we should explore. Thanks!

Check my blog for more updates.

--

--

Mike Kahn
Google Cloud - Community

Field Engineering Manager, Databricks. All views and opinions are my own. @mkahn5