Automate Your BigQuery Schema Definitions With 5 Lines of Python

Tired of manually writing my BigQuery schemas, I wrote a function that makes schema definition less time-consuming.

Zach Quinn
Pipeline: Your Data Engineering Resource

--

Humanoid robot.
Automation in human form. Photo by Possessed Photography on Unsplash

I need your help. Take a minute to answer a 3-question survey to tell me how I can help you outside this blog. All responses receive a free gift.

Optimizing BigQuery Schema Definitions

One thing I’ve learned while programming is that I’d rather do more work now to do less work later.

That’s why, if I find myself performing a redundant operation, I begin exploring the possibility of writing a function or implementing another method to avoid doing mundane, repetitive work.

Schema definition, while important, is one of the most mundane, repetitive tasks I do as a data engineer. While this task can be completed in the BigQuery UI, since I primarily use the Python client, I define my schemas manually to ensure that GCP doesn’t parse anything wrong.

In doing so, I often end up with some pretty ugly, lengthy code that, even with a config file, can become unruly.

If you’re dealing with nested fields, it can become downright unpleasant, like the below snippet:

--

--