DBT Unit Tests

Luca Ferraro
Data Reply IT | DataTech
8 min readJun 5, 2024

In the field of data engineering and data analytics Data Build Tool (DBT) is gaining lot of attention. DBT is a tool that allows to transform, test and document data models in a very efficient way.

The transformation data models in DBT are represented as SQL queries and they may contain lot of complex logics. To ensure that the models correctly represent business logic, the version 1.8 of DBT introduces unit tests. Unit tests are a fundamental aspect of software development where individual units or components of a system are tested in isolation to ensure they behave as expected. These tests verify that each unit of software performs correctly, as designed.

In the context of DBT:

Unit tests allow you to validate your SQL modeling logic on a small set of static inputs before you materialize your full model in production. Unit tests enable test-driven development, benefiting developer efficiency and code reliability- dbt Developer Hub

This publication aims to provide a detailed explanation of how unit tests work in DBT, showing a use case and highlighting the capabilities of this new feature.

DBT Fundamental Concepts and Tests

If you ever worked with DBT you may be familiar with the concept of Sources and Models.

Sources: The raw or unprocessed data tables or files that are taken from different data systems, including databases, APIs, or flat files, are represented as sources in DBT. They provide the raw data that must be cleaned, converted, and integrated in order to make it appropriate for analysis, and they act as the beginning point for data transformation procedures. Tables from an operational database comprising sales transactions, customer details, and product details, for instance, can be one type of source. These tables are taken out just as they are, and the modeling procedure turns them into useful datasets.

Models: SQL queries or statements that convert unprocessed data from source tables into processed or derived datasets that are more suited for reporting and analysis. They accomplish this by compiling, cleaning, and organizing datasets that make it simple to query and analyze them in order to gain knowledge. To build a coherent and cohesive dataset for analysis, a model can, for example, connect data from several source tables, apply transformations like aggregations or calculations, and filter out extraneous data.

DBT traditionally provides data tests (generic and singular) to ensure the quality, accuracy, and reliability of your data transformations. This kind of tests can be defined both on sources and models. DBT data tests are assertions or validations that you define to ensure the quality and correctness of your data transformations. They allow you to verify that your models produce the expected results and adhere to specific criteria or business rules. DBT automatically executes the defined tests against the corresponding models in your data warehouse. It generates SQL queries based on your test definitions and runs them against the underlying data.

After the introduction of DBT version 1.8 the syntax for DBT data tests has changed. Check docs!

For example DBT data tests may be used to check that the field of a source is not null and unique.

On the other hand, unit tests are not executed using data stored on warehouse. For unit testing, DBT generates a set of temporary tables based on the provided inputs defined in the tests. The computed rows of a model are compared with expected rows defined in the tests. If the expected and actual outputs are the same the unit test is successful. The data stored in the warehouse are not even read during the execution of a unit test.

Unit tests are defined by developers through yaml files. DBT will take care of converting the content of yaml files to SQL queries and run them. In the next section of this article will be explained how unit tests can be defined.

Since the business logic of data transformation is defined in models, DBT unit tests can be applied only on DBT models and not on DBT sources. If you want to define tests on sources you need to use “traditional” DBT data tests.

DBT Unit tests

There are many situations where DBT unit tests may be helpful. It is advisable to implement these tests for SQL models containing complex logics, such as:

  • case when operations with many when cases
  • Complex operations with date
  • Regex and concatenation of attributes

In these cases unit tests should be incorporated within the development process for test-driven approach and continuous integration to guarantee that modifications of the models do not cause failures.

On the other hand is not recommended to test specific built-in SQL functions e.g. max(), avg().

DBT Unit Testing is a very powerful feature, it allows you to define some mocked inputs and some expected outputs for your model. When unit tests are executed the model is run with input data defined in the unit tests. The column fields that are not explicitly defined in the unit test are considered null. No data are even read or written from the actual data storage that is connected to DBT. In order to run unit tests you just need to create a yml file where you specify the input data and predicted outcomes, you don’t need to make any operation on the database system. The real data stored in the warehouse are not affected by the unit tests.

Another powerful aspect of DBT unit tests is that you don’t need to specify all the input fields of a specific table. If you have a complex DBT model that reads hundreds or thousands of fields from many different tables you can just provide values from few of these fields useful for your specific tests. DBT will automatically assign null values to not defined fields.

Let’s see an example to clarify how DBT unit tests can be defined on a simple DBT model shop.sql:

--content of model shop

with shop AS (
shop.shop_code
company.company_name,
country.country_desc,
CASE WHEN shop.shop_code IN ('E000', 'E001', 'E002') THEN 'E-Commerce'
WHEN shop.shop_code = 'CS00' THEN
CASE WHEN COALESCE(shop.shop_type_code, '') ='V' THEN 'Vintage Clothing store'
ELSE 'Boutique Clothing Store'
END
WHEN SUBSTRING(shop.shop_code, 2, 1) = 'T' THEN 'Tech Store'
ELSE 'Generic Outlet'
END AS shop_description,
FROM {{ref(stg_shop}} shop
LEFT JOIN {{ref(stg_company)}} company
ON (shop.company_code = company.company_code)
LEFT JOIN {{ref(stg_country)}} country
ON (shop.country_code = country.country_code)

select * from shop

shop reads data from 3 different models. To compute the field shop_description some complex logics is required. This case_when operation is a typical situation where DBT unit tests may be applied. To define DBT unit tests we need to create a yml file with a custom name, for example shop_unit_tests.yml. The yaml files where unit tests are defined need to be saved inside the models folder of your DBT project.

#content of shop_unit_tests.yml

unit_tests:
- name: test_shop_description
description: "check shop_description field follows model logic in store model"
model: shop
given:
- input: ref('stg_shop')
rows:
- {shop_code: E000}
- {shop_code: E001}
- {shop_code: E002}
- {shop_code: CS00, shop_type_code: 'V'}
- {shop_code: CS00, shop_type_code: 'C'}
- {shop_code: AAAA}
- input: ref('stg_company')
rows: []
- input: ref('stg_country')
rows: []
expect:
rows:
- {shop_code: E000, shop_description: 'E-commerce'}
- {shop_code: E001, shop_description: 'E-commerce'}
- {shop_code: E002, shop_description: 'E-commerce'}
- {shop_code: CS00, shop_type_code: 'V', shop_description: 'Vintage Clothing store'}
- {shop_code: CS01, shop_type_code: 'C', shop_description: 'Boutique Clothing Store'}
- {shop_code: AAAA, shop_description: 'Generic Outlet'}

The purpose of the test is to cover all the cases for the values of shop_description field. To perform the test only some input fields of stg_shop are needed. The other fields not defined in the unit test are automatically set to null by DBT.

Why this kind of test is useful? If the model shop is updated by a developer and the logics behind shop_description is wrongly altered, the unit test will return an error. In this way unit tests enforce robustness of the model and force developers to correct the issues on the model before taking it to production.

The shop_unit_tests.yml contains many entries, let’s break it down:

  • name: unique identifier for the unit test.
  • description: can be used to define the purpose of the test.
  • model: each unit test is associated with a DBT model. It is required that the model is correctly defined (shop in our case) and should be runnable without errors. If the execution of shop model would result in a compilation or run-time error also the unit tests would raise error.
  • given: introduces the inputs of the models. All the sources and reference models that are read by the shop model should be defined. Even the tables that are not useful for the tests must be listed, otherwise the DBT unit test execution will fail.
  • expect: defines the expected rows obtained. These rows are compared with computed rows, (results of the model when running with input data defined in the given section). If any difference among computed and expected rows is detected, the DBT unit test will fail and will provide an accurate descriptions of the differences.

Another important constraint to execute unit tests is that all the parents of the tested DBT model must exist in the warehouse. To save resources you can build an empty versions of the models using --empty flag:

dbt run --empty 

DBT unit tests are triggered when the command dbt test is launched. Although this command runs all the tests defined in your project, including data tests. If you want to start only DBT unit tests you can use the command:

dbt test --select "test_type:unit"

If you want to run only the unit tests defined for the shop model you can launch:

dbt test --select "shop,test_type:unit"

The DBT unit tests may provide three different results:

  • success: the test was correctly defined and executed. The expected rows are identical to computed rows.
  • fail: there is at least a difference among expected output and calculated rows.
  • error: there was an error in the definition of the test. For example if you define an input field that does not exist in the warehouse table.

DBT unit tests on incremental models

If you need to define DBT unit tests on incremental models you should override is_incremental macro. What does it mean?

Incremental models in DBT can be identified by this configuration block:

{{ config(
materialized='incremental',
) }}

In this case the unit tests should define if the is_incremental is set to true or false:

unit_tests:
- name: test_name
model: model_name
overrides:
macros:
is_incremental: false
...

if is_incremental is set to false the model will be tested in full_refresh mode.

Conclusions

DBT unit tests provide an easy and simple way to check model logics on a small set of static inputs before materializing your model. This is particularly useful during development phase to ensure that the models of your project are robust and always follow business logics.

Unit Tests can be integrated into your Continuous Integration/Continuous Deployment (CI/CD) pipeline to automate the validation of your data transformations. By incorporating tests into your deployment process, you can ensure that changes to your DBT models are thoroughly validated before being promoted to production.

However unit tests are not enough to ensure good quality and accuracy of data transformation models. Data tests serve a different purpose and they should be used along with unit tests to build highly reliable DBT transformation pipelines.

--

--