Using BigQuery Change Data Capture with dbt Incremental
Google have recently announced a new pre-GA offering which lets you query changes made to a BigQuery table.
This feature lets you process incremental changes made to a table. Understanding what changes have been made to a table can help you do things like incrementally maintain a table replica outside of BigQuery while avoiding costly copies.
Google call this change history, I think of it as append-only change data capture, a common concept in data engineering.
I was really excited by this announcement as it hugely simplifies a problem I had. Namely, how to perform efficient incremental loads from a large, legacy, unpartitioned source table (which don’t cost the earth) using dbt.
Enough chat — see for yourself the difference in querying for ‘every row inserted yesterday’ using the new APPENDS TVF on our legacy table vs. using our lastTouched column:
That is a whopping 99.95% cost reduction, and the query returns around twice as fast.
Avoiding the full table scan is huge, and I haven’t had to impact the legacy processes using/maintaining this table. Win/Win!
To integrate this new feature into a dbt incremental model, it’s remarkably simple:
{% set today = modules.datetime.datetime.now().strftime("%Y-%m-%d") %}{% set yesterday = (modules.datetime.datetime.now() - modules.datetime.timedelta(1)).strftime("%Y-%m-%d") %}with source as (
SELECT
column_names
...{% if dbt.is_incremental() %}-- If incremental, load from the CDC table APPENDS table for yesterdayFROM APPENDS(TABLE {{ source('feefo-product','Sale') }}, '{{ yesterday }}', '{{ today }}'){% else %}-- If first run, or specified `--full-refresh`, load from the normal source tableFROM {{ source('feefo-product','Sale') }}{% endif %})SELECT * FROM source
Limitations
Change history is subject to the following limitations:
- You can only view information about appends, not updates or deletions.
- The data is limited to the time travel window of the table.
That’s all for today. Follow us for more tips and insights on product reviews.