Update Automation for Google Sheets
Problem Definition
By using an online productivity suite such as Google Sheets, you can create and edit spreadsheets directly on your web browser. Multiple people can work simultaneously and you can see other people’s changes in real-time. Every change is saved automatically. Many companies are still using Google Sheets as their daily documents, to deliver some reports. How about your company?
If yes, then maybe you have across a problem where the data in the Google Sheets need to be updated daily to be aligned with other google sheets or other sources. In this case, you need to update the spreadsheet manually every single day to add to your frustration. It is not a major problem if the amount of data is small. However, this becomes a laborious task if you need to check hundreds of rows with ten or more columns, or if you need to maintain more than three sheets. This will become an immediate nightmare.
Solution :
You can create an automation job to handle the task using the Google Sheet API in various ways. For example, you can make automation to append new data, update the data, or truncate the data in order to adjust to your company's business model. There are several tools available for job automation, one of them is Apache NiFi.
First of All, we will breakdown the Google Sheet API :
Google already provide us with an API to let us read, write, and format Google Sheets data with various programming languages, including Java, JavaScript, and Python. For further details, you can find and read it here: google API reference.
In this article, I will provide a simple example to update the data in Google Sheets using Google Sheets REST API (batchUpdateByDataFilter method). You can customize it with other methods in Google Sheets API in order to fit your business requirements (e.g., append-only, truncate row or column, delete data, etc).
Update Data
You can use this API endpoint to update one or more rows or columns in your sheet.
POST https://sheets.googleapis.com/v4/spreadsheets/{spreadsheetId}/values:batchUpdate
spreadsheetId
refers to the ID of your spreadsheet. You can get it from the URL of your spreadsheet, For example :
then, you create the request body for the message you intend to send to the above API batchUpdate
endpoint:
{
"data": [
{
"majorDimension": "COLUMNS",
"range": "Sheet1!A4:C4",
"values": [
[
"3"
],
[
"OR3"
],
[
"paid"
]
]
}
],
"valueInputOption": "USER_ENTERED"
}
data.majorDimension
refers to the dimension of the data array. In this example, I useCOLUMNS
value because I want to append the data array as one row.range
refers to the range the values cover.values
refers to the value of the data that was read or to be written.valueInputOption
refers to how the input data should be interpreted.USER_ENTERED
means the values will be parsed as if the user typed them into the Google Sheets UI.ValueInputOption
will be rules when the input data should be interpreted. this could beRAW
orUSER_ENTERED
.RAW
: when you want the data will be stored as-is.USER_ENTERED
: when you want the data will be parsed as if the user typed them into the UI.
Example:
This is a Google Sheets and the sheet name Sheet1
After the HTTP request is sent using the body I have created above, the end result of the spreadsheet change will be like this:
Update the order_id
OR3 and will not change the notes column.
You can try with the CURL below, change the {spreadsheetId}
with the id of your spreadsheet. You also need to include the access_token, as this API need authorization.
curl --location -g --request POST 'https://sheets.googleapis.com/v4/spreadsheets/{spreadsheetId}/values:batchUpdate' \
--header 'Authorization: Bearer {access_token}' \
--header 'Content-Type: application/json' \
--data-raw '{
"data": [
{
"majorDimension": "COLUMNS",
"range": "Sheet1!A4:C4",
"values": [
[
"3"
],
[
"OR3"
],
[
"paid"
]
]}],
"valueInputOption": "USER_ENTERED"
}'
In order to create access_token you can read this documentation:
https://developers.google.com/identity/protocols/oauth2#1.-obtain-oauth-2.0-credentials-from-the-dynamic_data.setvar.console_name-.
The access_token requires one of the following OAuth scopes:
- https://www.googleapis.com/auth/drive
For more information, please refers to the OAuth 2.0 Overview.
- https://www.googleapis.com/auth/drive.file
— https://www.googleapis.com/auth/spreadsheets
Append Data
To append the new data to Google Sheets, we can use this HTTP request:
{
"majorDimension": "ROWS",
"values": [
[
"4",
"OR4",
"pending"
]
]
}
You can try with the CURL sample below; make sure to change the {spreadsheetId}
with the id of your spreadsheet and {range}
with the range for the sheet that you want (e.g., Sheet1!A:C
). Keep in mind, as this API request needs authorization, you need to include the access_token.
curl --location -g --request POST 'POST https://sheets.googleapis.com/v4/spreadsheets/{spreadsheetId}/values/{range}:append' \
--header 'Authorization: Bearer {access_token}' \
--header 'Content-Type: text/plain' \
--data-raw '{
"majorDimension": "ROWS",
"values": [
[
"4",
"OR4",
"pending"
]
]
}'
Create Automation in NiFi :
After we know the API, now we will create the automation and process flow to generate and hit the API on schedule. For this purpose, we use Apache NiFi, which offers a web-based user interface for the design, control, feedback, and monitoring of the dataflow.
General NiFi Steps :
- Generate Flowfile
- Get data from the source, e.g.: MySQL, PostgreSQL, MongoDB, other Sheet
- Delete, append, or update the data based on the source; you can customize it to handle your needs.
I will walk you through using this simple example. Let's say we are still using GSheet as a daily data checker, and the GSheet get the data from the MySQL table.
Sample data of transaction table:
Sample data of order status table:
Sample of initial GSheet data:
The GSheet contain column from table transactions and status, and there is a column Notes, column Notes will be filled by GSheet owner to record a brief description related to the order in the shop.
Step in NiFi:
- Generate Flowfile
- Get the target data from MySQL query result.
- Get Google OAuth : OAuth 2.0 provides authenticated access to an API.
- Build Body: Mapping the data to an array of strings; in this sample, we map the data to be like this:
{
"majorDimension": "ROWS",
"values":
[
["1","OR1","paid"],
["2","OR2","paid"],
["3","OR3","cancel"]
]
}
5. Invoke HTTP : to send request to Google Sheet API
By understanding the Google Sheets API, we can make automation with Apache NiFi, python and others. I hope this article useful for you to get a bigger picture of how to use the Google Sheets API.