Testing frameworks in dbt

Published in

datamindedbe

6 min readJun 12, 2023

When I was struggling to find the right way to write tests in dbt, I got great inspiration from Mariah Rogers’ talk about testing at Coalesce 2022. The key message there is that there’s fundamentally 2 different kinds of tests: assumptions about the data, and assertions about the code. I fully subscribe to that idea, and if you look at it from a high-level system’s view, I think that makes sense:

Let’s look at dbt run as a box with inputs and outputs. Your inputs are 1) your model / macro code and 2) the data that your models refer to. Both are used in dbt run to produce some output data.

Usually, we run tests against the right-hand side of the picture: the output data. The problem is that you have 2 inputs that could both be changing, so if your test fails, you don’t know where to look. Mariah’s suggestion is to be more explicit about what you’re testing: code or data.

Freeze data to test code, freeze code to test data

From the system’s perspective again, that makes sense: freeze all inputs except the one you want to test, and monitor how the output changes.

In practice, that means you’ll fix your input data during development by using mock data, until you get what you expect. Usually, you also define what you expect (also using mock data), so the monitoring part is a simple comparison. When you know your code works, you push that to production where it’s frozen (until you release a new version of course), and you want to focus on testing your data.

So that’s what we did, and along the way, we’ve found 4 different cases to apply tests for, each of them having a different way of testing. This is what we ended up with:

Testing model code: dbt_unit_testing
Testing macro code: dbt test (in a specific way)
Testing data content: Soda
Testing — or rather, enforcing — data schemas: dbt data contracts

Let’s have a closer look at each of them.

Testing model code with dbt_unit_testing

To test the dbt models we write, we use a package called dbt_unit_testing by EqualExperts. You mock input data (in dbt that means sources or references), you specify the expected output, and there’s a slightly scary step where you have to override the ref and source macros, or use the explicit dbt_unit_testing. prefix for those two, but if you look at what’s actually behind it, it’s not too bad.

select * from {{ dbt_unit_testing.source('my_schema','my_table') }}

The unit test code looks something like this:

{{ config(tags=["unit-test"]) }}

{% call dbt_unit_testing.test(
    "[stg_my_staging_model]", "[Testing whether my_staging_model works]"
) %}

{% call dbt_unit_testing.mock_source("my_schema", "my_table", {"input_format": "csv"}) %}
  id | some_number::int | some_varchar | date::date    | ts::timestamp(6)
  1  | 150              | 'some_text'  | '2023-06-07'  | current_timestamp(6)
{% endcall %}

{% call dbt_unit_testing.expect({"input_format": "csv"}) %}
  id | derived_varchar | derived_date::date
  1  | 'some_text'     | '2023-12-31'
{% endcall %}

{% endcall %}

The alternative is the built-in dbt seeds. Seeds can be annoying because you have to run dbt seedwhich puts the data in your data warehouse. The dbt_unit_testing package uses selects that don’t leave dirt behind, and it brings some extra niceties too, like well-formatted test output with a diff between expected data and result data (see below) and different ways to input mock data, such as the CSV format you see above.

Test output example:

18:53:04  MODEL: stg_my_staging_model
18:53:04  TEST:  Testing whether my_staging_model works
18:53:04  ERROR: Rows mismatch:
18:53:04  | diff | count | id | derived_varchar | derived_date | 
18:53:04  | ---- | ----- | -- | ----------------| ------------ | 
18:53:04  | +    |     1 | 1  | some_text       | 2024-01-01   | 
18:53:04  | -    |     1 | 1  | some_text       | 2023-12-31   | 
18:53:06  1 of 1 FAIL 1 test_stg_geog214_xt_land__land .... [FAIL 1 in 2.71s]

I put these tests under tests/unit/ in my dbt project.

Testing macro code with dbt test

Where the fun ends with dbt_unit_testing is when you want to test something else than a model. I tried testing dbt macros with the same dbt_unit_testing framework, but then you would have to make a dbt model specifically for macro testing, which shouldn’t run in a regular dbt run . So I took a step back and went with plain old dbt test. I wanted it to resemble the unit tests, so this is what I came up with:

with
    input as (
        select
            0 as id,
            'Some_input_that_needs_formatting'
            as input_col,
            'Some input that needs formatting'
            as expected_output_col
        union all
        select
            1 as id,
            'More_input_to_be_  -- _formatted!'
            as input_col,
            'More input to be formatted!'
            as expected_output_col
    ),
    macro_output as (
        select id, {{ my_format_macro("input_col") }} as macro_output_col from input
    )
select input.id, input_col, expected_output_col, macro_output_col
from input
join macro_output on input.id = macro_output.id
where macro_output.macro_output_col != input.expected_output_col

Here you can see I tried to keep input and expected output close together. You can easily spot what the function is doing, and if you want to write more tests, you only have to copy a test, modify it, and put a union all in-between. I put these tests under tests/macros/ in my dbt project.

Testing data content with Soda

For testing the actual content of the data, we’ve found that Soda works well for us. The biggest advantage here is how easy it is to write these tests. Soda uses yamls to define the tests, and it’s almost just English, as can be seen in this yaml from the Soda docs:

checks for dim_customer:
  - invalid_count(email_address) = 0:
       valid format: email
       name: Ensure values are formatted as email addresses
  - missing_count(last_name) = 0:
       name: Ensure there are no null values in the Last Name column
  - duplicate_count(phone) = 0:
       name: No duplicate phone numbers
  - freshness(date_first_purchase) < 7d:
       name: Data in this dataset is less than 7 days old

It even becomes viable for less technical profiles to write tests, or at least to read them. Soda also brings a lot of built-in tests to the table: referential integrity, number of rows, text formats (e.g. email addresses), cross-table checks (e.g. same number of rows), even distribution checks… See the docs for the Soda checks language. Nothing you can’t also achieve with great expectations, and custom tests,… but Soda makes it easy. I put these tests under soda/ in the top level of my dbt project.

Testing (enforcing) data schemas with dbt data contracts

This is my favorite one. First, our plan was to run some kind of dbt test that would check the schema of some tables before actually dbt running them. Practically speaking, that would be the Airflow DAG where the dbt run task depends on the dbt test task.

Since dbt v1.5 , that’s not necessary anymore. You can use the built-in contracts for that. You already have your models.yaml file which describes the models in your dbt project, but now you can actually enforce a schema for those models. dbt will refuse to run your model if it does not adhere to the contract.

All you have to do to enable it is enable it in the models.yaml where you defined the columns for that model.

- name: dim_customers
    config:
      contract:
        enforced: true

As output, you’ll get something like this:

[Compilation Error in model dim_customers (models/marts/dim_customers.sql)[0m
[14:17:21    This model has an enforced contract that failed.
[14:17:21    Please ensure the name, data_type, and number of columns in your contract match the columns in your model's definition.
[14:17:21    
[14:17:21    | column_name               | definition_type | contract_type | mismatch_reason       |
[14:17:21    | ------------------------- | --------------- | ------------- | --------------------- |
[14:17:21    | customer_id               | VARCHAR         | BIGINT        | data type mismatch    |
[14:17:21    | load_date                 |                 | DATE          | missing in definition |
[14:17:21    | last_updated_dt           |                 | DATE          | missing in definition |

dbt recommends to split up your models in staging, intermediate and martsfolders. Contracts, you’d only define for your marts / output models, because that’s where others are counting on the models to adhere to a contract, much like an API.

Conclusion

Make the difference explicit between code tests and data tests. Use the right frameworks for each kind of test. The dbt ecosystem for testing is still evolving, and I expect for dbt to incorporate some of these approaches into the core, but for now, there are some great packages and tools out there to help you write the right tests in the right way. Good luck!

There was a webinar on this on July 7th, 2023 with practical examples, you can still watch it here: https://www.youtube.com/watch?v=t--bIdEZIF8

You can also find the code examples here: https://github.com/datamindedbe/conveyor-samples/tree/demo/data_quality/various/quality_coffee_dbt/dbt