dbt unit-test framework
This is a follow-up of my previous article on Unit testing with dbt.
Since the release of dbt v1.8.0-b1, a dedicated framework for unit-testing has been released and at Teads, we decided to try it.
All in one place
No need to define fixtures and expected models, to declare a dbt_utils.equality in a YML file.
All it requires is a single YML file placed in your model folder. I personally store them in a ci
folder. The file format is quite straightforward:
unit_tests:
- name: my_test
description: "some meaningfull description"
model: the_model_I_want_to_test
given:
- input:
rows:
- {col1: val1, col2: val2}
- input:
rows:
- {col1: val1, col2: val2}
expect:
rows:
- {col1: val1, col2: val2}
To run your test: dbt test --select my_test
Several tests in one file
Unlike the previous way of conducting unit-tests in dbt, you can now define multiple tests on the same models:
unit_tests:
- name: my_test_1
...
- name: my_test_2
...
Now, it’s possible to independently unit-test every single feature, utilizing the minimal number of inputs and expected outcomes. This greatly enhances the maintainability of the unit-tests.
Upstream Dependencies materialization
As most dbt models refer to upstream models or sources, they must be fixtured as part of the unit tests. Those models are not materialized physically inside your engine but they are inlined in the tested model as an ephemeral one.
It is much easier to debug as everything is inlined, and it eliminates the need to create a separate test project with a cleanup strategy, thus avoiding collisions in your test model names.
Variable override
In our use case, we are using incremental table materialization partitioned by hour. Consequently, on production, our models are executed with an input variable supplied via the dbt command line:
dbt run --select my_model --vars '{date:"2024:04:16 17:00:00"}'
This variable can be specified in the unit-tests enabling us to validate the logic associated with it:
unit_tests:
- name: my_test
description: "some meaningfull description"
model: the_model_I_want_to_test
overrides:
vars:
date: "2024:04:16 17:00:00"
given:
...
Nested structure integration
It’s feasible to define arrays and complex structures:
bidders: 'struct([struct(struct(895528 AS gid, ...) AS element)] AS list)'
There are some limitations to be aware of:
- Every field of the structure needs to be defined, even if they are NULL. While often specifying unused fields as null works, such as
NULL AS cid
, this isn’t always the case. For example, for floats, you may need to explicitly cast null as FLOAT64, likeCAST (NULL AS FLOAT64) AS price
. Consequently, in our case, we ended up with several lines of NULL fields just to set a single field in a structure. - The error message provided by the database when the structure is not properly defined is not always clear and doesn’t offer much assistance in fixing it.
Database Error
Invalid cast from STRUCT<list ARRAY<STRUCT<element STRUCT<gid INT64, cid INT64, score INT64, ...>>>> to STRUCT<list ARRAY<STRUCT<element STRUCT<gid INT64, cid INT64, score FLOAT64, ...>>>> at [52:14136]
- Ultimately, if a developer randomly adds a new field to the structure, it can break our tests on the master branch, which isn’t ideal.
Clear output for the unit-tests’ failures
When a test fails, a visually appealing output with color highlighting aids in identifying the issue quickly:
No support for UDF
As of now, if your model incorporates a User-Defined Function (UDF), the generated SQL for the unit test is incorrect. Consequently, you won’t be able to successfully test your model using the unit test framework.
My feedbacks
This unit-test framework is definitively nicer to use than using dbt_utils.equality: faster to write and debug, more atomic tests, clear output. It has become the new standard at Teads.