How to use google spreadsheet for your backend database
A very common problem of data scraping is
- where should I put the data
- where should I run the program
You can always run your program locally, but I bet you definitely don’t want to run it 24/7 on your own laptop. Putting it on cloud is the option you would probably consider. Different cloud providers usually charge on the resources your program consumes, and besides that the storage charges too. Thanks to Google Spreadsheet and Kintohub now we have another alternative which is completely free!
In this tutorial I am showing a real life use case: download some data in csv format from a remote source (e.g. the coronavirus figures in HK from goverment), and upload it the the google spreadsheet once per hour. We will write a program in node.js and run it on Kintohub as a cronjob. Kintohub allow you to schedule and run your program, which is limited to total 512MB ram for free tier. It also support deploying stateful databases such as Postgres/Mysql etc, but it does cost and we better to stay to the free option: google spreadsheet.
Create google spreadsheet with permissions
If you are familiar with Google Drive API permissions you can skip this section.
In order to allow a program to edit your Google Spreadsheets, Google provides different authentication methods. The best approach for a cronjob nature program would be using the Service Accounts. The following steps will show how we setup the Service Account and download the corresponding credential file credentials.json
.
0. First of all you need to create a google project on gcloud console and enable Google Sheets API
Creating a Gcloud account let you use the different Google services, and don’t worry it is free (assuming there is no huge traffic ). After you enabling it, click the Credentials
on the side bar to create a new credential (which is the service account)
This will prompt you to choose the export format. Choose JSON
and download the credentials.json
. Save this file for later use.
After that you will have the service account created, with Google Sheets API enabled. Check the credentials.json
and you can find an email, which represents the service account itself. Now create a new google spreadsheet and click the share
button on top right to allow access for the service account.
Now your service account should be able to modify your google spreadsheet!
Running your job on Kintohub
This is a working example already running on Kintohub. The job itself is simple
- download the corona virus latest figures from a public csv released from HK Government
- download from our google spread sheet to check if there is new records
- compare the data and only insert the new records
- upload back to the spreadsheet
0. Fork The Repository
Of course you need to write a program first. Or else you can fork this repository to your Github account to try it out first.
1. Register Your Account on Kintohub
Visit https://beta.kintohub.com/sign-up for your new account
2. Connect Git
After signing up the account, just head to Create Kintoblock. Here you will need to setup your Github authorization to allow KintoHub to build from your repositories. For better security I recommend you to specify the exact repository you want to build.
3. Build Your Program
After connecting to your git repository(-ies), it is now the time to create the program. On Kintohub, everything is a KintoBlock. No matter it is a cronjob, a backend API, a website, or even a database. This provides an abstract idea on what you are going to run. You should always pick the right type when you create it. Now for us we should pick the JOB type as it will be scheduled and run periodically.
Pay attention when creating the Jobs. Kintohub now supports only Dockerfile when it is a Job, so make sure your repository contains a Dockerfile. (Don’t worry the example repository already contains one ;) )
After creating the KintoBlock we need to set up some Environment Variables as well. The idea to split the configurations outside from your code is we should always follow the best practice and I really recommend you to read the principles from12 factor app.
For the demo project we need to configure the following environment variables:
CREDENTIAL_JSON
: paste the content of thecredentials.json
you got from the previous step, and remove every newline/spaces (it is recommend to do it on a text editor like sublime text)SPREADSHEET_ID
: the ID of your google spreadsheet. you can obtain in via the url of your spreadsheet
— e.g.https://docs.google.com/spreadsheets/d/1Z4…jfdSLO8/edit#gid=0
the1Z4…jfdSLO8
part is the IDREMOTE_CSV_PATH
:http://www.chp.gov.hk/files/misc/latest_situation_of_reported_cases_wuhan_eng.csv
on top left. For the example project we need to configure the following environment variables:
4. Configure Your Program
Once the build complete, it is the time to run it! Click the Deploy Now button ,it will prompt you to select the projects. On Kintohub you can have multiple projects within the same workspace, and each project you can have multiple environments. This sounds complex, but this is an advanced feature allowing you to have a more production ready workflow, we could definately skip them right now. Click on Create Project button to create a default project now.
Since this is a node-js script, I recommend you to run it with 128MB memory. And one point to notice, if the process run with more memory than the setting here, it will receive a OOM kill immediately and may receive no errors.
You should also check the REPEAT_SCHEDULE
tab. RUN_ONCE
is the jobs type that for pre/post deployment jobs, and which only triggered when you do deployment. The cron pattern , you can either manually input or just use the helper buttons provided below.
5. Run Your Program
Completing the deployment by clicking the Create New Project button. You will see the deployment in progress, and after that your program is scheduled to run! You can check the status of the program by viewing the log via the Logs
button on top right. (You can choose Pre-Deployment or Post-Deployment type for your Job first in order to test-run it)