EtLT approach: Load data into BigQuery with DBT

It may sound counterintuitive but can save your day

Published in

Israeli Tech Radar

7 min readJul 4, 2023

Do you need to incrementally ingest data from a bucket to BigQuery while slightly transforming the data during ingestion? Do want to keep your tech stack lean, and already use DBT? Come with me to see one interesting approach.

What will you find inside:

When bq load won’t work.
What is the EtLT approach?
EtLT approach using DBT incremental models on external tables.
Caveats and limitations to consider.

Do you need to incrementally ingest data from Google Storage to BigQuery? Easy-peasy! Just set up a load job:

# Reading CSV files is still (unfortunately? 😜)
#  a common data engineering task 
bq load \
  --replace=false \
  --field_delimiter='\t' \
  --source_format='CSV' \
  "$bq_project:$bq_dataset.$bq_table" \
  'gs://my-bucket/*'

“Wait wait wait! I have TBs of data — we must partition the data by a given timestamp column.”

Okey dokey! Let’s add this to the beginning of our loading script:

bq query --use_legacy_sql=false \
  "
  CREATE TABLE IF NOT EXISTS \`$bq_project.$bq_dataset.$bq_table\` (
     # The source CSV schema…

EtLT approach: Load data into BigQuery with DBT

It may sound counterintuitive but can save your day

Written by Yerachmiel Feltzman