Validate Your BigQuery SQL Table Metadata Using Just 1 Word

Leverage BigQuery SQL table metadata to deduplicate, partition and delete data — all using only one word.

Zach Quinn
Learning SQL

--

There are some queries I write so often that I genuinely believe that if I fell asleep on my keyboard, when I would wake up, they’d be entered and running in my SQL environment. Certain functions, especially datetime and timestamp conversions, fall into this category. As do a few simple quality assurance queries.

One such query that gets row counts by filtering on a given upload date is a go-to to determine how many rows loads to a table in a day.

That looks like this:

SELECT DATE(dt_updated) AS dt_updated, COUNT(1) AS rowCount 
FROM `project.dataset.table`
GROUP BY 1 ORDER BY 1 DESC

Executing the above query looks like this:

The reason this works so well and so precisely is because we’re able to filter rows by a provided date. This dt_updated column isn’t part of the data I ingest in this case. It’s calculated by the script at runtime, looking something like this:

import pandas as pd

dt_updated = pd.Timestamp.utcnow()

But if I hadn’t included that, there would be no reliable, dynamic date field available for filtering. If your data pipelines function predictably this is ok, but not ideal.

--

--

Zach Quinn
Learning SQL

Journalist—>Sr. Data Engineer; new stories every Monday.