Unit-testing DBT Postgres

Published in

Albert Franzi

3 min readJan 2, 2024

This exploration delves into a less-traversed yet critical aspect of DBT: the sophisticated art of unit-testing our models against diverse and complex scenarios.

After extensive research into DBT documentation and observing a gap in the developer community’s approach to unit-testing, we decided to devise a straightforward method to ensure the reliability of our DBT models.

It is widely acknowledged that a key advantage of DBT is its robust testing capabilities. These features facilitate the enforcement of various constraints and enable the crafting of custom tests to verify specific business rules within our models.

However, it’s crucial to note that these tests run at the time of execution, indicating that our code — and potentially any hidden bugs — have already been deployed.

Numerous organizations have integrated DBT testing into their CI/CD pipelines by connecting to a real database environment, using either dummy or actual production data.

Drawing from my extensive experience as a Data Engineer, particularly my proficiency with Spark and Kafka, I have always prioritized writing unit tests for new developments. This practice prompted a significant inquiry when I began working with DBT: Why not apply the same rigorous unit-testing methodology to our DBT SQL models?

So, how are we going to test it?

To make it feasible, we will take advantage of the following utilities:

0. Awesome, let’s start!

We begin by leveraging the pytest-postgresql library, enabling us to validate our DBT Postgres projects without relying on a pre-existing database environment.

Let’s start by providing the “inject fixture” to our code.

Plus initializing the Postgres process we will be using in our unit tests.

1. Wait! Well done, dbt-community! 🤩

In my previous post, we used the subprocess command to run dbt from Python, in addition to parsing the logs ourselves.

Lucky us! dbt supported programmatic invocations since v1.5, allowing us to add an extra Python layer before executing our dbt code and another one after the execution, enabling us to introspect the dbt logs programmatically.

2. Great, so?

Once we can execute dbt programmatically, we need to make sure we can recreate the specific dbt lineage to be tested.

After executing dbt parse, we will get the list of external sources used in our project, and only if we document them with the data types 😉 we will be able to generate the DDLs so we can populate them before running dbt.

See full code in dbt_validator.py

Therefore copy our CSVs into the sources.

3. DB ready sir! 👨‍🍳

Great, we already recreated the external sources and injected data, so we can proceed by running dbt taking advantage of the dbt testing feature.

Similar to the parse outcome, we can parse the run and test executions and validate if any model is broken or any defined test is not behaving as expected.

5. Show me an example! 👾

So far, the code is quite simple and we are providing a way to test our DBT code in an isolated way by simulating different potential scenarios and making sure any future change in code keeps our expected outcomes and our code free of bugs.

All we will need to provide for each test is:

sources_to_load: CSV file plus the source table reference to copy the data to.
selector: dbt select statement to validate a subset of models. Useful if you already have too many dbt models, so you can rollout this practice model by model.
outputs_to_validate: CSV file plus the target table to fully validate it’s content.

See our test code for a deeper understanding

That’s all! With some code, we enabled us to unit-test our dbt models with different input files and we can guarantee to our data consumers we keep our dbt quality on track.

Feedback is a gift, we hope that by sharing our DBT testing experience we helped you improve your DBT quality too.

We encourage you to share your experiences 💌 and approaches to DBT model testing and validation. Your feedback is invaluable, and we look forward to learning from your insights.

The code is published on GitHub and Pypi.

GitHub - afranzi/pytest-dbt-postgres

Contribute to afranzi/pytest-dbt-postgres development by creating an account on GitHub.

github.com