EtLT approach: Load data into BigQuery with DBT

It may sound counterintuitive but can save your day

Yerachmiel Feltzman
Israeli Tech Radar

--

Do you need to incrementally ingest data from a bucket to BigQuery while slightly transforming the data during ingestion? Do want to keep your tech stack lean, and already use DBT? Come with me to see one interesting approach.

What will you find inside:

  1. When bq load won’t work.
  2. What is the EtLT approach?
  3. EtLT approach using DBT incremental models on external tables.
  4. Caveats and limitations to consider.
Photo by AbsolutVision on Unsplash

Do you need to incrementally ingest data from Google Storage to BigQuery? Easy-peasy! Just set up a load job:

# Reading CSV files is still (unfortunately? 😜)
# a common data engineering task
bq load \
--replace=false \
--field_delimiter='\t' \
--source_format='CSV' \
"$bq_project:$bq_dataset.$bq_table" \
'gs://my-bucket/*'

“Wait wait wait! I have TBs of data — we must partition the data by a given timestamp column.”

Okey dokey! Let’s add this to the beginning of our loading script:

bq query --use_legacy_sql=false \
"
CREATE TABLE IF NOT EXISTS \`$bq_project.$bq_dataset.$bq_table\` (
# The source CSV schema…

--

--