Update BigQuery columns description from Google Spreadsheet

Alan Vainsencher
Plarium-engineering
2 min readJan 26, 2021

--

At my company, we heavily rely on Google Cloud. Specifically for datawarehousing we use BigQuery. With thousands of tables in it, we need to do a good job in keeping metadata tidy and clean.

In order to do this, our product data team has different Google spreadsheets so they can manage the column description for every field of every table. Product don’t have direct access or permissions to edit directly on BigQuery, so they do it in a spreadsheet like this one:

The schema is completely made up for this article. Don’t try to find any logic in it :)

With this spreadsheet, we have processes working in the background that update BigQuery columns description with the data we have in the spreadsheets. These processes use many python libraries Google provides in order to communicate with their products and some other open source libraries as well.

First, we need some methods to talk to the Google spreadsheet and retrieve the fields and their descriptions. We use the library gspread.

import gspread
from oauth2client.service_account import ServiceAccountCredentials
class GoogleSheetHandler():
def __init__(self):
self.gc = gspread.authorize(ServiceAccountCredentials.from_json_keyfile_name("c://PATH_TO_YOUR_SECRET/credentials.json", ['https://spreadsheets.google.com/feeds',
'https://www.googleapis.com/auth/drive']))
def get_spread_sheet(self,spread_sheet_id,worksheet):
self.spread_sheet = self.gc.open_by_key(spread_sheet_id)
self.worksheet = self.spread_sheet.worksheet(worksheet)
def get_all_values_by_col(self,col_id):
return self.worksheet.col_values(col_id)

Make sure you give read permissions in the spreadsheet to the user that is on your credentials.json.

Then we have two helper functions and a main one to update the fields. The tricky and fun part here is that this main function is recursive, so it supports any level of nested Record types BigQuery provides.

We have a BigQuery table like this:

and calling this function:

update_description_columns_bqtable_from_gsheet(project, dataset,table_name,"YOUR_SPREADSHEET_ID","YOUR_WORKSHEEET_NAME",1,4)

then we get this:

Please make sure your fields in the spreadsheet are with the same case as of the columns in BigQuery. While writing this article, I had done the mistake of putting different cases, and of course it didn’t work :)

Happy bigquerying :)

--

--