Use Cases and Different Ways to get Files Into Google Cloud Storage
Including AppEngine with Firebase Resumable Uploads
For this article I will break down down a few different ways to interact with Google Cloud Storage (GCS). The GCP docs state the following ways to upload your data: via the UI, via the gsutil CLI tool, or via JSON API in various languages. I’ll go over a few specific use cases and approaches for the ingress options to GCS below.
1. upload form on Google App Engine (GAE) using the JSON api
Use case: public upload portal (small files)
2. upload form with firebase on GAE using the JSON api
Use case: public upload portal (large files), uploads within mobile app
3. gsutil command line integrated with scripts or schedulers like cron
Use case: backups/archive, integration with scripts, migrations
4. S3 / GCP compatible file management programs such as Cyberduck
Use case: cloud storage management via desktop, migrations
5. Cloud function (GCF)
Use case: Integration, changes in buckets, HTTP requests
6. Cloud console
Use case: cloud storage management via desktop, migrations
1. App Engine nodejs with JSON API for smaller files
You can launch a small nodejs app on GAE for accepting smaller files directly to GCS ~20MB pretty easily. I started with the nodejs GCS sample for GAE on the GCP github account here.
This is a nice solution for integrating uploads around 20MB. Just remember the nginx servers behind GAE have a file upload limit. So if you try and upload something say around 50MB, you’ll receive an nginx error: ☹️
You can try and upload the file size limit in the js file but still the web servers behind GAE will have a limit for file uploads. So, if you plan to create an upload form on App Engine, be sure to have a file size limitation in your UI.
2. App Engine nodejs Firebase with JSON API and Resumable uploads for large files
Since the previous example only works for smaller files, I wondered how can we solve for uploading larger files say 100MB or 1GB? I started with the nodejs app engine storage example here.
With the help of Nathan we integrated resumable uploads with firebase SDK. Code can be found here
While not very elegant with no status bar or anything fancy this solution does work for uploading large files from the web. 🙌🏻
3. gsutil from local or remote
gsutil makes it easy to copy files to and from cloud storage buckets
Just make sure you have the google cloud sdk on your workstation or remote server (https://cloud.google.com/sdk/downloads), set project and authenticate and thats it.
mkahnucf@meanstack-3-vm:~$ gsutil ls
mkahnucf@meanstack-3-vm:~/nodejs-docs-samples/appengine/storage$ gsutil cp app.js gs://testing-31337-public
/ [1 files][ 2.7 KiB/ 2.7 KiB]
Operation completed over 1 objects/2.7 KiB.
More details here.
gsutil makes it just easy to automate backup of directories, sync changes in directories, backup database dumps, and easily integrate with apps or schedulers for scripted file uploads to GCS.
Below is the rsync cron I have for my cloud storage bucket and the html files on my blog. This way I have consistency between my GCS bucket and my GCE instances if I decide to upload a file via www or via GCS UI.
root@mkahncom-instance-group-multizone-kr5q:~# crontab -l
*/2 * * * * gsutil rsync -r /var/www/html gs://mkahnarchive/mkahncombackup
*/2 * * * * gsutil rsync -r gs://mkahnarchive/mkahncombackup /var/www/html
4. Cyberduck (MacOS) or any application with an s3 interface
Enjoy an client ftp type experience with Cyberduck on MacOS for GCS.
Cyberduck has very nice oauth integration for connecting to the GCS API built into the interface.
After authenticating with oauth you can browse all of your buckets and upload to them via the cyberduck app. Nice option to have for moving many directories or folders into multiple buckets.
More info on CyberDuck here.
5. Cloud Function
You can also configure a Google Cloud Function (GCF) to upload files to GCS from a remote or local location. This tutorial below is just for uploading files in a directory to GCS. Run the cloud function and it zips a local directory files and puts the zip into the GCS stage bucket.
Try the tutorial:
Michaels-iMac:gcf_gcs mkahnimac$ gcloud beta functions deploy helloGCS -stage-bucket mike-kahn-functions -trigger-bucket mikekahn-public-upload
Copying file:///var/folders/kq/5kq2pt090nx3ghp667nwygz80000gn/T/tmp6PXJmJ/fun.zip [Content-Type=application/zip]…
- [1 files][ 634.0 B/ 634.0 B]
Operation completed over 1 objects/634.0 B.
Deploying function (may take a while — up to 2 minutes)…done.
You can also use cloud functions created to display bucket logs. Below shows a file uploaded via my public upload form and deleted via the console ui. This could be handy for pub/sub notifications or for reporting.
Michaels-iMac:gcf_gcs mkahnimac$ gcloud beta functions logs read helloGCS
LEVEL NAME EXECUTION_ID TIME_UTC LOG
D helloGCS 127516914299587 2017-05-31 03:46:19.412 Function execution started
I helloGCS 127516914299587 2017-05-31 03:46:19.502 File FLIGHTS BANGKOK.xlsx metadata updated.
D helloGCS 127516914299587 2017-05-31 03:46:19.523 Function execution took 113 ms, finished with status: 'ok'
D helloGCS 127581619801475 2017-05-31 18:31:00.156 Function execution started
I helloGCS 127581619801475 2017-05-31 18:31:00.379 File FLIGHTS BANGKOK.xlsx deleted.
D helloGCS 127581619801475 2017-05-31 18:31:00.478 Function execution took 323 ms, finished with status: 'ok'
Cloud Functions can come in handy for background tasks like regular maintenance from events on your GCP infrastructure or from activity on HTTP applications. Check out the how-to guides for writing and deploying cloud functions here.
6. Cloud Console UI
The UI works well for GCS administration. GCP even has a transfer service for files on S3 buckets on AWS or other s3 buckets elsewhere. One thing that is lacking in the portal currently would be object lifecycle management. This is nice for automated archiving to coldline cheaper object storage for infrequently accessed files or files over a certain age in buckets. For now you can only modify object lifecycle via gsutil or via API. Like most GCP features they start at the function/API level then make their way into that portal (the way it should be IMO) and I’m fine with that. I expect object lifecycle rules to be implemented into the GCP portal at some point in the future. 😃
In summary I’ve used a few GCP samples and tutorials that are available to display to different ways to get files onto GCS. GCS is flexible with many ingress options that can be integrated into systems or applications quite easily! In 2017 the use cases for object storage are abundant and GCP makes it easy to send and receive files in GCS.
Leave a comment for any interesting use cases for GCS that I may have missed or that we should explore. Thanks!
Check my blog for more updates.