How can you load a Data Dictionary into BigQuery again and again . . . ?

And give better visibility & understanding of data

Lee Doolan
Qodea Google Cloud Tech Blog
4 min readJul 26, 2022

--

BigQuery data engineers are responsible for creating each element used in a data platform. This includes defining table names, partitioning configurations, column names, data types and descriptions for example.

Much of these elements rely on their technical experience to get right, and once in production cannot be easily changed.

What can be changed is metadata such as description labels.

These can be modified numerous times after inception without risk and are helpful for analysts and other developers to understand a dataset. It is therefore important they are kept current and closely aligned to familiar business object usage and lingo.

Luckily in BigQuery each object has description fields we can populate ‘after the event’ from more reputable sources than us.

This blog post is about how we do that, specifically for datasets, tables and table columns.

Updating a single description field

The simplest way to update a description field of a BigQuery object is using a DDL command.

The code block below describes how you can update a single field for a dataset, table and table column:

But that will soon become an unwieldy way to update many description fields. It’ll be cool if we could do this in bulk, maybe using BigQuery’s procedural language.

Updating description fields in bulk

This will be as simple as setting up a loop through 3 tables containing our data object descriptions i.e. one for the datasets, tables and table columns.

These tables we’ll assume already exist, maybe populated on a nightly basis from an external Data Dictionary tool, wiki etc, or even simple CSV files.

NB for a great starter on BigQuery CSV file loading and associated gremlins please see this excellent blog post from Tim Ellis-Smith, my colleague here at CTS.

Datasets

Assuming we have a table set up similar to below:

Ref_BQ_Datasets

We can loop through this table and set the dataset descriptions using the code below:

Tables

Assuming we have a table set up similar to below:

Ref_BQ_Tables

We can loop through this table and set the table descriptions using the code below:

Table Columns

Assuming we have a table set up similar to below:

Ref_DW_Table_Columns

We can loop through this table and set the table column descriptions using the code below:

Result

The description fields should now be populated and viewable in the BigQuery console as below.

Dataset Object
Table Object

Python Version

If anybody wants to implement this in python you can using the gist code here.

It’s a slight variation on the BigQuery code samples for dataset, table and table column, see links.

Wrapping Up

This is a fairly simply way to load a data dictionary into BigQuery giving your analysts better visibility and understanding of new datasets and fields.

Best of all the Data Engineers can be removed as owner of this important business information and leave describing data objects to the real data owners.

An added bonus is all that lovely metadata, will be auto added and searchable from the data catalog of Dataplex too, if configured.

Dataplex is a Google service to help us search for insightful data, and understand and make data useful. Take a look at this useful guide on what it does and why it can be useful to you.

We’ve also learnt how to loop through a table and run dynamically built DDL SQL statements along the way which can be reused for many other tasks.

I hope this is useful, and please feel free to drop me a line for ideas on future posts.

About CTS

CTS is the largest dedicated Google Cloud practice in Europe and one of the world’s leading Google Cloud experts, winning 2020 Google Partner of the Year Awards for both Workspace and GCP.

We offer a unique full stack Google Cloud solution for businesses, encompassing cloud migration and infrastructure modernisation. Our data practice focuses on analysis and visualisation, providing industry specific solutions for; Retail, Financial Services, Media and Entertainment.

We’re building talented teams ready to change the world using Google technologies. So if you’re passionate, curious and keen to get stuck in — take a look at our Careers Page and join us for the ride!

--

--

Lee Doolan
Qodea Google Cloud Tech Blog

Cloud Data Warehouse Architect & Data Engineer | UK Based | https://www.linkedin.com/in/leedoolan77 | Thoughts are my own and not of my employer