Unlocking Reliable Data and Team Efficiency with dbt Cloud: A Hands-On Experience

Published in

Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

6 min readJun 23, 2023

In today’s data-driven world, organizations face the challenge of managing and deriving insights from massive amounts of data. To tackle this, Snowflake and dbt have emerged as powerful tools that empower data teams to transform and analyze data with efficiency and scalability. In this blog, we will explore how to work with Snowflake in conjunction with dbt Cloud, and introduce the Snowflake Quickstart Accelerating Data Teams with Snowflake and dbt Cloud Hands-On Lab.

As data teams grow in size and complexity, manual data transformation processes become cumbersome and error-prone. dbt Cloud addresses these challenges by providing a collaborative and automated environment for managing the entire data transformation lifecycle. After introducing dbt at a high level, we’ll highlight the key benefits covered in the quickstart that make dbt Cloud an indispensable tool for accelerating your data team’s productivity.

What is dbt?

dbt is the “T”, transform, in ELT and provides a framework for transforming your data. We’ve all had to transform data from raw source to cleaned, joined, and aggregated final models that business stakeholders use for their reporting. With dbt, every analyst can version control their work and make sure that it’s tested and documented. With dbt Cloud, you’re able to schedule and automate the refreshes of your data models at the frequency you need.

dbt connects to your data source in Snowflake and executes the defined transformations on the data. dbt does not move your data — it pushes those commands down to Snowflake; this is great from both performance and compliance aspects. dbt automatically manages source freshness and incremental data loading processes, ensuring that only the necessary transformations are applied to the updated data. It also performs checks for data quality and consistency, allowing you to validate your transformed data against defined expectations.

Fig 1. dbt sits in the middle of the data stack as a transformation tool on top of your warehouse.

How dbt works in a nutshell:

It works by allowing data engineers and analysts to define and manage data transformations using SQL and python code in a structured and modular manner. dbt doesn’t make you rewrite the same boilerplate code over and over again, and uses don’t repeat yourself (DRY) code practices and concepts like reusable code snippets called macros. Under the hood, dbt treats your data transformations as a collection of SQL-based models, each representing specific analytical or business logic. These models can be organized into a DAG (Directed Acyclic Graph) to define dependencies and ensure proper execution order.

Here we can see in our TPC-H data what our fct_ordersmodel relies on upstream and what it feeds into downstream (revenue is important to know!).

Fig 2. An automated lineage DAG for fct_ordersto easily know our dependencies.

It’s worth calling out that the DAG is created by using the ref function in dbt.

Taking fct_orders as an example:

dbt syntax:

select * from {{ref('stg_tpch_orders'}}

Compiled code executed on Snowflake:

select * from ANALYTICS.staging.order_items

Testing and describing our data is as easy as a few lines of yaml code (yes really).

What tests look like in dbt:

- name: fct_orders
  description: orders fact table
  columns:
    - name: order_key
      description: primary key of the model
      tests:
        - unique
        - not_null

That’s it! In the last 3 lines here we specify the tests. The other code above that specifies the column we are running the test on order_key and what model it belongs to fct_orders.

The YAML code then creates SQL statements to test that your order_key values are all unique and not null.

Unique test SQL created:

select
      count(*) as failures,
      count(*) != 0 as should_warn,
      count(*) != 0 as should_error
    from (
        select
            order_key as unique_field,
            count(*) as n_records
        
        from ANALYTICS.dbt_hwatson.fct_orders
        where order_key is not null
        group by order_key
        having count(*) > 1
       ) dbt_internal_test

Not null test SQL created:

select
      count(*) as failures,
      count(*) != 0 as should_warn,
      count(*) != 0 as should_error
    from (
      select order_key
      from ANALYTICS.dbt_hwatson.fct_orders
      where order_key is null
     ) dbt_internal_test

Now that we know how dbt works let’s highlight 5 takeaways from the Snowflake Quickstart Accelerating Data Teams with Snowflake and dbt Cloud Hands-On Lab:

1. Partner Connect:

Using dbt Cloud is the easiest way to get started with dbt and with Snowflake Partner Connect you’re able to seamlessly create a dbt Cloud account and connect it with your Snowflake account in a couple clicks. All of the connection details and permissions are taken care of for you and you’re ready to immediately get started building out your data transformations. Click here to learn more about Snowflake Partner Connect.

2. Collaborative Data Development:

dbt Cloud enables data analysts, engineers, and stakeholders to collaborate seamlessly. In dbt Cloud analytics engineers, data engineers, and data scientists can both see, contribute, and leverage each other’s work. Its IDE allows team members to work together on projects, share models, and review code changes. This collaborative environment promotes knowledge sharing, enhances team productivity, and ensures that everyone is aligned on the data transformation logic. If you want to know more about all the great dbt Cloud IDE features, learn about them here.

3. Automated Data Transformations:

With dbt Cloud, data teams can automate all of their dbt transformations. It provides a centralized platform to schedule and orchestrate dbt runs, ensuring that data is processed reliably and consistently. The ability to automate these processes reduces manual effort and enables teams to focus on higher-value tasks such as analysis and modeling.

Fig 3. Creating jobs is easy on dbt Cloud whether on a scheduled cadence or as part of a CI/CD triggered process.

4. Version Control:

dbt Cloud integrates seamlessly with a variety of Git providers, enabling teams to track changes and collaborate effectively. When reviewing code in Git, team members are able to provide feedback on proposed changes, ensuring that data transformations adhere to best practices and are high quality. In dbt Cloud’s built in CI, you’ll never have to worry about pushing breaking changes to your production environment.

Fig 4. dbt Cloud’s built in Git workflow allows everyone to feel comfortable version controlling their work

5. Scalability and Performance:

Leveraging the power of Snowflake’s Cloud data platform, dbt Cloud provides the ability to scale your performance needs through configuration. As your team begins to handle larger volumes of data, dbt allows you to specify which warehouse to use when building your tables and views. As your data project grows, using dbt Cloud with Snowflake enables you to meet your goals in the most efficient and performant way possible.

Conclusion:

The Snowflake Quickstart Accelerating Data Teams with Snowflake and dbt Cloud Hands-On Lab showcases the power and benefits of using Snowflake and dbt Cloud together. From beginner analysts to seasoned data professionals — if you’d like to improve the quality of your data transformation processes, be sure to check out this quickstart!

Unlocking Reliable Data and Team Efficiency with dbt Cloud: A Hands-On Experience

Written by Hope Watson - dbt Labs