Using the Change Function in BigQuery
How it can ease Data Engineering and Quality/Audit Checks
This time I want to introduce the CHANGE
function in BigQuery which returns all rows that have changed in a table for a given time range.
When integrating data from various data sources to BigQuery you often want to keep information about changes that happen to your data. Therefore Data Engineers often have to build additional SQL logic. For example if a data row is added or updated you could add columns to your schema which log this kind of information. For example something like this:
+---------+-------------+--------------------------+
| ID| change_type | change_time |
+---------+-------------+--------------------------+
| 1 | INSERT | 2022-06-07 20:06:00.488000 UTC |
| 2 | INSERT | 2022-06-07 20:18:08.490000 UTC |
+---------+-------------+--------------------------+
With the CHANGE
function Google offers this kind of information already build in within BigQuery. The CHANGES
function returns all rows that have changed in a table for a given time range. To use the CHANGES
function on a table, you must set the table's enable_change_history
option to TRUE
[1].
What you have to keep in mind is that the data returned by the CHANGES
function is limited to the…