Test SQL Pipelines against Production Clones using DBT and Snowflake

Published in

The Airtasker Tribe

6 min readOct 8, 2019

A difficult challenge of any software development is ensuring the code you’ve created has its intended effect. Over the years, the software field has evolved impressive approaches to testing, using a combination of tooling, practice and automation. Modern complex applications that used to require weeks of manual testing can now be tested in a matter of minutes without any human involvement.

The data field is far less mature in this area. It’s not uncommon for modern data pipelines to be subjected to time consuming manual testing. I believe the reason for this is twofold. Firstly, data development has only in the past few years started to adapt practices from software development (e.g. data/analytics engineering). More importantly though, testing data pipelines typically involves a combination of testing not only logic, but very complex state, and a lot of it, i.e. the current set of all records in the database. In my work as a data warehouse developer and engineer, it wasn’t uncommon for systems that worked flawlessly in DEV, SIT, and UAT to fail in production. This normally happened as testing occurred on smaller subsets of data to ensure a timely response, or because a unseen state had occurred in the data.

Being able to test your data pipelines against a current replica of production in an automated…

Test SQL Pipelines against Production Clones using DBT and Snowflake

Written by Dan Gooden