Incremental Data Ingestion in BigQuery using MERGE

Antonio Cachuan
Plumbers Of Data Science
2 min readDec 1, 2021

--

When you are building your Data Warehouse on BigQuery exists many techniques for ingesting your data. For example, you could make a complete insert from your data source to BigQuery, this may work if the source table is small. However, another more common scenario is to develop incremental pipelines.

Incremental Pipelines

The image below represents an incremental pipeline, the first day all the data is loaded to BigQuery, and the next day only the new rows are inserted, at the same time compare the ids looking for changes, in this case, the amount.

Example

In this scenario, we are assuming that we have a federated table on PostgreSQL configured (check this article for more information) and we need to move daily transactional data, this data could get updated up to two dates after the transaction happens.

In order to start let’s create our table on Postgres and BigQuery.

Then to create the incremental pipeline we’ll be using MERGE defining transaction_id as the main…

--

--

Antonio Cachuan
Plumbers Of Data Science

Google Cloud Professional Data Engineer (2x GCP). When code meets data, success is assured 🧡. Happy to share code and ideas 💡 linkedin.com/in/antoniocachuan/