How can you load a Data Dictionary into BigQuery again and again . . . ?
And give better visibility & understanding of data
BigQuery data engineers are responsible for creating each element used in a data platform. This includes defining table names, partitioning configurations, column names, data types and descriptions for example.
Much of these elements rely on their technical experience to get right, and once in production cannot be easily changed.
What can be changed is metadata such as description labels.
These can be modified numerous times after inception without risk and are helpful for analysts and other developers to understand a dataset. It is therefore important they are kept current and closely aligned to familiar business object usage and lingo.
Luckily in BigQuery each object has description fields we can populate ‘after the event’ from more reputable sources than us.
This blog post is about how we do that, specifically for datasets, tables and table columns.
Updating a single description field
The simplest way to update a description field of a BigQuery object is using a DDL command.
The code block below describes how you can update a single field for a dataset, table and table column:
But that will soon become an unwieldy way to update many description fields. It’ll be cool if we could do this in bulk, maybe using BigQuery’s procedural language.
Updating description fields in bulk
This will be as simple as setting up a loop through 3 tables containing our data object descriptions i.e. one for the datasets, tables and table columns.
These tables we’ll assume already exist, maybe populated on a nightly basis from an external Data Dictionary tool, wiki etc, or even simple CSV files.
NB for a great starter on BigQuery CSV file loading and associated gremlins please see this excellent blog post from Tim Ellis-Smith, my colleague here at CTS.
Datasets
Assuming we have a table set up similar to below:
We can loop through this table and set the dataset descriptions using the code below:
Tables
Assuming we have a table set up similar to below:
We can loop through this table and set the table descriptions using the code below:
Table Columns
Assuming we have a table set up similar to below:
We can loop through this table and set the table column descriptions using the code below:
Result
The description fields should now be populated and viewable in the BigQuery console as below.
Python Version
If anybody wants to implement this in python you can using the gist code here.
It’s a slight variation on the BigQuery code samples for dataset, table and table column, see links.
Wrapping Up
This is a fairly simply way to load a data dictionary into BigQuery giving your analysts better visibility and understanding of new datasets and fields.
Best of all the Data Engineers can be removed as owner of this important business information and leave describing data objects to the real data owners.
An added bonus is all that lovely metadata, will be auto added and searchable from the data catalog of Dataplex too, if configured.
Dataplex is a Google service to help us search for insightful data, and understand and make data useful. Take a look at this useful guide on what it does and why it can be useful to you.
We’ve also learnt how to loop through a table and run dynamically built DDL SQL statements along the way which can be reused for many other tasks.
I hope this is useful, and please feel free to drop me a line for ideas on future posts.
About CTS
CTS is the largest dedicated Google Cloud practice in Europe and one of the world’s leading Google Cloud experts, winning 2020 Google Partner of the Year Awards for both Workspace and GCP.
We offer a unique full stack Google Cloud solution for businesses, encompassing cloud migration and infrastructure modernisation. Our data practice focuses on analysis and visualisation, providing industry specific solutions for; Retail, Financial Services, Media and Entertainment.
We’re building talented teams ready to change the world using Google technologies. So if you’re passionate, curious and keen to get stuck in — take a look at our Careers Page and join us for the ride!