How to use Column Value Diff for Data Impact Assessment in dbt with Recce

Dave Flynn
In the Pipeline
Published in
3 min readMar 14, 2024

Column value diff is an essential stat for determining the level of impact that has occurred in a model following a code change. The value diff results will show you which columns have been impacted and provide the perfect starting point for understanding where to check next to further validate your work.

Value Diff can shows the percentage match of columns in a dbt model before-and-after updating model code
Value Diff can shows the percentage match of columns in a dbt model before-and-after updating model code

When to do it

For refactoring jobs, when you expect there to be zero impact on the data, seeing that 100% match rate between dev and prod gives you the confidence to know that the data is the same.

For logic change jobs, when you expect to see change, the percentage will let you immediately know how much impact has occurred — The lower the percentage match, the higher the impact.

Value diff is useful for any situation when you want to get a high-level overview of data impact on a model/table in your dbt project. It should be an integral part of your toolkit of validation checks as part of your dbt testing practices.

Value diff in Recce

You can perform Value Diffs in Recce on individual models, or subsets of models easily. Here’s a couple of examples that show how to do it.

Refactor

In the refactoring example, two intermediate tables have been added upstream. The expectation is that downstream models should not be impacted.

Using the multi-model value diff in Recce, you can quickly check if the data in these downstream models matches your baseline from before the new upstream models were added.

Check the percentage match rate for multiple dbt models in Recce

Each model shows that All columns match and clicking a model confirms all columns have 100% match. In this case, you can be confident that your change has not caused any unintended impact on downstream data.

Check out the online demo for this dbt PR yourself.

Logic change

In the logic change example, the calculation for customer_lifetime_value in the customer model has been updated. There is an expectation that the values in this column will change, and that there will also be downstream impact.

Check the percentage match rate for multiple dbt models in Recce
Check the percentage match rate for multiple dbt models in Recce

The value shows a mis-match in columns in the customer and customer_segment models. Checking the diff results shows which columns are impacted and the percentage match.

Check out the online demo for this dbt PR yourself.

Recce

Learn more about the features of Recce in the following Loom:

--

--

Dave Flynn
In the Pipeline

Dave is a developer advocate for DataRecce.io — the data modeling validation and PR review toolkit for dbt data projects