Using BigQuery Change Data Capture with dbt Incremental

Jay Lewis
Feefo Product Engineering
2 min readSep 26, 2022

Google have recently announced a new pre-GA offering which lets you query changes made to a BigQuery table.

This feature lets you process incremental changes made to a table. Understanding what changes have been made to a table can help you do things like incrementally maintain a table replica outside of BigQuery while avoiding costly copies.

Google call this change history, I think of it as append-only change data capture, a common concept in data engineering.

I was really excited by this announcement as it hugely simplifies a problem I had. Namely, how to perform efficient incremental loads from a large, legacy, unpartitioned source table (which don’t cost the earth) using dbt.

Enough chat — see for yourself the difference in querying for ‘every row inserted yesterday’ using the new APPENDS TVF on our legacy table vs. using our lastTouched column:

old
new (ignore the red line, pre-GA things)

That is a whopping 99.95% cost reduction, and the query returns around twice as fast.

Avoiding the full table scan is huge, and I haven’t had to impact the legacy processes using/maintaining this table. Win/Win!

To integrate this new feature into a dbt incremental model, it’s remarkably simple:

{% set today = modules.datetime.datetime.now().strftime("%Y-%m-%d") %}{% set yesterday = (modules.datetime.datetime.now() - modules.datetime.timedelta(1)).strftime("%Y-%m-%d") %}with source as (
SELECT
column_names
...
{% if dbt.is_incremental() %}-- If incremental, load from the CDC table APPENDS table for yesterdayFROM APPENDS(TABLE {{ source('feefo-product','Sale') }}, '{{ yesterday }}', '{{ today }}'){% else %}-- If first run, or specified `--full-refresh`, load from the normal source tableFROM {{ source('feefo-product','Sale') }}{% endif %})SELECT * FROM source

Limitations

Change history is subject to the following limitations:

  • You can only view information about appends, not updates or deletions.
  • The data is limited to the time travel window of the table.

That’s all for today. Follow us for more tips and insights on product reviews.

--

--