How to keep your app serverless with Google Sheets as a data-storage

Mikhail Ageev
Analytics Vidhya
Published in
4 min readApr 10, 2020
Going serverless with Python, Google Sheets and Cloud Functions

This is the 3rd article in a series about covering COVID-19 worldwide spreading from a data-journalism perspective. The first article was a general description of the solution behind our interactive map. The second was about working with various data-sources and rendering custom features on map.

And this one is about technical aspects of building a serverless app on Python for fetching, combining, storing and serving JSON data.

Cloud Function

Initially, the scrapping app was intended to be executed as a simple Python script using cli, as Google Cloud Function, Amazon Lambda or a Docker Container. And it worked fine on my Ubuntu server as a crontab job.

However, there was an obstacle with manual data input that required me to update COVID-19 data for Kosovo and Belarus in source code and then pushing it to repo. That’s not a convenient and resilient solution.

But, how to store and update this data keeping the app as lightwight as it is. How to make the process of updating the data smooth for people without tech background (journalists)?

There is a solution.

Google Spreadsheet as a Storage Engine

Imagine, you have to store and update data that is regularly used by your app. The data has to be securely updated by various persons and to have a user-friendly interface.

Of course you may spend few hours to setup a whole virtual machine, configuring webserver and web framework for this solution, manage user credentials.

But, what if we simply use a Google Spreadsheet as a storage engine and API-layer?

Spreadsheet Key Benefits

  • Access Control Model
  • Well-known UI and workflow
  • Data validation
  • Data export as a CSV via URL (we ignore Google Sheets API here)

The Solution

Here is our data-storage in Google Spreadsheet:

Google Spreadsheet Data Storage

To make it easier for users to update the data and to avoid input errors, we may implement Data validation and formating for columns.

Each field in COUNTRY column becomes a dropdown with a list of countries from another sheet. Kinda of ForeignKey relationship in your model without unique indexes.

Setting list of values for a column

Now we have to make the Spreadsheet available as a CSV-file via URL: File → Publish to the web.

Publishing Google Spreadsheet as a CSV

Done! Now you have a URL with your data in a readable format. You may share the Spreadsheet with designated people and let Google to manage Authentication and Authorization for you.

Obviously, it’s possible to use Google Sheets API to fetch the data. However, CSV export seems to me much more faster and easier in this particular case.

Python Cloud Function

The last thing to do — is to write some code. Here is a primitive function that fetches content of the Spreadsheet and returns a list of records:

Fetching and parsing data from Google Spreadsheet with Python

However, that’s not enough for a serverless app. We have to fetch data from CSSE at JHU ArcGIS map feed and github repo, Worldometer website and, finally, combine the data with our Google Sheets source.

So, we starting with the installation of dependences (see requirements.txt). To store our JSON output in a Amazon S3-type Storage we have to set AWS_* environment variables. Otherwise, the app would try to store data in Redis as a fall-back to Cloud.

COVID-19 Data parser for Google Cloud/Amazon Lambda with support of Redis and Amazon S3-type Cloud Storage

In the gist above Google Spreadsheet URL moved into environment variable to make the app more flexible.

Running Google Cloud Function

Adding your Python function into Cloud Console is quite obvious:

  1. Select how to trigger function execution (e.g. Cloud Pub/Sub Topic)
  2. Select source for your code (in our case we use inline editing)
  3. Copy & Paste function code and provide list of required packages
  4. Set environment variables (we set MANUAL_DATA_SOURCE_URL with a URL to Google Sheet) and click Deploy

Thats all.

Deploying Python code as a Google Cloud Function
Log of running update_covid19_data as a Google Cloud Function

Going Docker with Redis Support

You may want to go a step further and wrap the whole thing as a Docker Container. No problem folks.

Docker Container for running Python app

Thanks for reading! You may find detailed readme, all sources of the project and Docker configuration for a Ubuntu-container with a crontab job on github.

--

--