My BigQuery stories -Finding Google Sheets(ID) linked with BigQuery table

Bihag Kashikar
intelia
Published in
3 min readOct 29, 2023

Finding Google Sheets(ID) linked with BigQuery table

In this first of many “My BigQuery stories”, I would like to share my knowledge I gained while working in data engineering, specifically in Google Cloud.

In this article, I am sharing one of tricky challenge I came across while working on one of the projects.

Use case:

Google Cloud BigQuery and Google Sheets provide native integration to do basic data exploration — both ways — i.e.

Google Sheets to BigQuery using BigQuery sheets connector, and BigQuery to Google Sheets using BigQuery external tables.

Recently in one my projects, the use of Google Sheets to BigQuery with BigQuery sheets connector usage grew to the point where we were not able to find out which Sheets / Sheet IDs are referenced with which table in BigQuery as there is no visual lineage to find out within Google Sheets or in underlying table. Users of Google sheets would run SELECT query on the table, and in the EXPLORE DATA option, users would use option Explore with Sheets.

This made general housekeeping activities like archiving/deleting google sheets really tricky, furthermore, some sheets were shared with customers which meant we couldn’t really delete those for compliance and record keeping purposes.

Solution:

To find out the lineage, I learnt there are two options other than using python sdks.

Option 1:

To find all “connected sheets” — first step was to query BigQuery information_schema — see below query, the job_id and labels give away the connected sheets queries run on BigQuery table. This way I could identify which tables are used in the Google Sheets, moreover I also noticed that the data registered in the information_schema for all queries initiated from Google Sheets contains job_id value as ‘sheets_dataconnector_%’.

SELECT 
distinct refs.project_id, refs.dataset_id,refs.table_id,job_id,
lbl.key,lbl.value,creation_time
FROM `region-australia-southeast1`.INFORMATION_SCHEMA.JOBS,
unnest(labels) as lbl, unnest(referenced_tables) as refs
WHERE job_id like 'sheets_dataconnector_%'
order by creation_time desc

Below is the output:

Option 2: Manual one sheet at a time.

Step 1: You can pick the sheetId from the open google sheet.

Step 2: and query the Cloud logging — you should be able to find docId as highlighted. Expanding the logs, it gives away the linked table name.

Wrap-up:

This was indeed one of the “cool” things I found out about Google Cloud and BigQuery and thoroughly enjoyed course of my findings. I hope to have helped fellow Google Cloud users who may find this useful.

--

--

Bihag Kashikar
intelia
Editor for

Google Cloud, Data analytics, Data architecture, System design @ intelia Pty. Ltd