Member-only story
Google launched Change Sequence Number for BigQuery
Better Management with CDC for Streaming Data to BigQuery
Google just announced that you can now define a _CHANGE_SEQUENCE_NUMBER
for BigQuery change data capture (CDC) to manage streaming UPSERT
ordering for BigQuery[1].
To integrate data from other sources to BigQuery in a modern way to guarantee good data quality and (near) real time, CDC via BigQuery streaming could be a good solution. Therefor Google also launched Data Stream for BigQuery last year.
When performing streaming upserts to BigQuery, records with the same primary key are, by default, ordered based on the system time at which they were ingested. This means that the most recently ingested record (with the latest system timestamp) will overwrite earlier ones. While this behavior works in many cases, it can be insufficient in scenarios where upserts to the same primary key happen very frequently within a short time frame, or when the ingestion order isn’t guaranteed[1][2].