dbt test? How to stop executions based on results from non-dbt tests

Published in

SDG Group

6 min readMar 11, 2024

Those of us who are familiar with dbt are aware of the great utility that tests offer us, both Generic data tests (not_null, unique, referential integrity) and Singular data ones.

These tests help us perform data validations at both a technical and functional level in a simple manner, just by parameterizing them in the definition of the models. dbt tests are executed as part of a ‘dbt build,’ either after completing the ‘dbt run’ or independently using the ‘dbt test’ command.

Tests can be configured to either interrupt or not interrupt the execution of the DAG if one of them fails, depending on the criticality of the validation. The test results can be viewed in the execution terminal, both in the CLI version and in Cloud. In this article, we will focus on the former.

So far, everything is Ok. However, from an organization’s observability point of view, this functionality can be somewhat limited.

If your project needs to keep track of the tests that have been executed and their results over time, you might need to add certain logic in dbt, starting by using the ‘ — store-failures’ flag or also ‘store_failures_as’ if you are using version 1.7 onwards.

Furthermore, this flag materializes the results into a table, and as such in dbt, it will be overwritten with each execution. Therefore, if you want to maintain a history of test executions performed on your models, you need to create other models that read from this table in each execution and materialize them into other incremental models to historize them.

Another point to consider is that your organization may already have implemented or may want to implement a process (for example, driven by metadata) that can perform tests on tables and columns in a simple and parameterizable way. In this case, your project would be the owner of the tests to be carried out, being able to add more complex tests such as custom field formats, record existence, column existence, exact number of records, other functional rules, etc. All of that, storing the results in a customizable way for your organization.

In addition to this, it is possible that by using tests, being a unique feature of dbt, you may enter a vendor-locking spiral that your organization wants to avoid.

So, let’s think that we have a process already in place for validating our data. How can we integrate the results of such tests with dbt? Can we control the execution of a ‘dbt run’ and make it fail based on the results of our tests?

The answer is: YES

The ideal scenario is that we want the DAG execution to stop to avoid propagating incorrect results.

How can we achieve that?

To be able to run ‘external’ tests in dbt and have the test results stop a DAG, you might need to develop your project in a modular way and make use of MACROS. The macro should run after every module has finished, so that in the case an error occurs, the following module does not launch.

In our case, the project has been divided into several phases, based on its data persistency and functionality:

PSA: Persistent Staging Area
NPSA: Non-Persistent Staging Area
RDV: Raw Data Vault
BDV: Business Data Vault
IM: Information Mart

In our metadata, tests have also been divided into the same phases so that when each phase ends, a MACRO is triggered to execute all the SQL tests for that phase.

As explained before, the main idea is that if any test in that phase shows an error, our process should not proceed to the next phase.

/*  Technical side note on the metadata process   */

The metadata process owned has a series of SQL-Templates build to automate tests. At runtime, these templates are updated, and proper SQLs are generated and run.

For example, the test that checks if a table has records has a simplified SQL-Template like this:

--VR_SPECIFICATION_DEFINITION:

SELECT
    COUNT(1) COD_METRIC
  , '[COD_DATASET]' COD_DATASET
  , '[COD_PHASE]' COD_PHASE
  , '[COD_VALIDATION]' COD_VALIDATION
  , CASE WHEN COD_METRIC_META_VAL>0 THEN 'INFO' ELSE 'ERROR' END COD_THRESHOLD
  , '-> Validation description: [DESC_VALIDATION] ' DESC_VALIDATION
  , URRENT_TIMESTAMP() DT_LOG
FROM [COD_DATASET_ORIGIN_REAL] DataSet;

In our metadata process, we have decided to validate the existence of records to the below tables. To achieve this, we have added one line for each table to which we want to apply this validation.

The metadata would look something like this:

Metadata input that defines the datasets (tables) to run the validation against.

Our process will read the template defined above and replace the [TAGS] in that template with the metadata values in the excel file, ultimately creating the SQL that will execute the test.

/*  End of Technical side note  */

Returning to our dbt process, once each of the phases is completed, the MACRO is triggered on a posthook (or the next model). The MACRO code will have to read all the validations SQLs corresponding to that phase. Subsequently, it will have to execute these queries and store the results in a table.

Here is where the magic happens. The next step is to read the results from the validation table and call the ‘exceptions.raise_compiler_error’ function in case an error record is found.

This function aborts a dbt DAG execution by making the model to FAIL. In our case, we use it if we find an error value in our validation results table. This can be achieved this way:

Code that checks MY_VALIDATION_RESULTS_TABLE and stops the execution of dbt if it finds an “ERROR”.

(Please bear in mind that code has been shortened for simplicity)

The code shown above would generate the following error if it finds an empty table to which we apply a validation:

Shows the actual Error Message set in our Macro

This function makes it easy for us to use an organization’s own testing tools to control the execution of a DAG in dbt, allowing us, if necessary, to terminate it if the data does not meet the quality standards required for the next phase.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Why this article?

This strategy has been used in a project that was migrated from a previous technology to dbt. There was a metadata-driven testing process in place in the project, so we could not use dbt tests. This metadata testing process included more than 7k validations that were applied to the system, including, but not limited to the existence of records, tables, data format, date formats, Referential integrity, customized rules, functional rules, etc.

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

Hope you find it interesting!

Pedro Lencina

pedro.lencina@sdggroup.com

— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —

dbt test? How to stop executions based on results from non-dbt tests

Written by Pedro Lencina