The Benefits of Using DBT for Data Transformation

Yaniv Ben Zvi
Israeli Tech Radar
Published in
5 min readNov 19, 2022

TL;DR DBT is a data transformation tool that offers many benefits for teams that need to quickly and efficiently transform data. DBT is user-friendly, provides a wide range of transformation options, and enables users to test and document data transformations.

Data Transformation (image source)

Data Transformation: What it is and Why You Need It?

As data analytics become more critical for businesses in recent years, the need for tools that can help teams quickly and efficiently transform data has also grown. One such tool is DBT, which offers many benefits for teams that need to transform data.

As data warehouse and business intelligence (DW/BI) operations become more complex, the traditional approach to data transformation- ETL (extract, transform, and load) — has struggled to keep pace.
The ELT (extract, load, and transform) approach has gained popularity in recent years as an alternative to ETL, but it still relies heavily on manual code development and is often difficult to manage at scale.

There are many different types of data transformations, including data cleansing, data enrichment, data normalization, and data aggregation. Each type of transformation has its own set of challenges, which can make it difficult to choose the right tool for the job.

What is DBT?

DBT is a data transformation tool that enables data analysts and engineers to transform, test, and document data in the cloud data warehouse. DBT is designed to handle the challenges of data transformation, including data cleansing, data enrichment, data normalization, and data aggregation.

DBT provides several features that make it an ideal tool for data transformation, including a user-friendly interface, a wide range of transformation options, and the ability to test and document data transformations.

How DBT works

DBT works by first extracting data from its source, then loading the data into the DW as raw data, which is ready for transformation using the DBT tool. Once the data is loaded, the user can choose from a wide range of transformation options.

The DBT transforms the data already in your data warehouse. Transformations are expressed in simple SQL SELECT statements, and when executed, DBT compiles the code, infers dependency graphs, runs models in order, and writes the necessary DDL/DML to execute against your Snowflake instance.

That gives DBT an edge in performance since the data is being transformed in the data warehouse itself rather than being extracted, transformed, and then loaded back into the data warehouse.

After the transformation is complete, the user can then test the data to ensure it is accurate.

Finally, the user can document the transformation to share with others or to keep for future reference.

Data lifecycle (image source)

The Benefits of Using DBT

  • A user-friendly tool that offers a wide range of transformation options, making it ideal for data analysts and engineers who need to transform data quickly and efficiently.
  • Enables users to test and document data transformations, which helps to ensuring accuracy and to sharing knowledge with others on the data team.
  • An SQL-first tool, which makes it easy to use for teams that are already familiar with SQL.
  • DBT is designed to handle the challenges of data transformation, including data cleansing, data enrichment, data normalization, and data aggregation.

Getting started with DBT playground

If you want to explore DBT and see how it could help with your data transformation needs, you can clone the DBT demo project called “jaffle_shop” from GitHub and follow the instructions in the README file to get started.

Inside the jaffle_shop directory, you will find the following files and directories:

  • dbt_project.yml — this is the DBT project manifest file.
  • Models — this directory contains the SQL files that define the DBT models.
  • Seed — this directory contains the data files that are used to populate the DBT models.
  • Target — this directory contains the DDL files that are used to create the database tables.

To run the DBT project, you will need to have a database that is compatible with the DDL files in the target directory. The DDL files in the target directory are for a PostgreSQL database.

Since the purpose of the playground is to provide a safe environment for experimentation, we gonna use Postgres DB locally instead of a cloud data warehouse.

Prerequisites:

First clone jaffle_shop repository:

git clone git@github.com:dbt-labs/jaffle_shop.git

Create a `docker-compose.yaml` file with Postgres service as follow:

version: '3.5'

services:
postgres:
container_name: postgres_container
image: postgres
environment:
POSTGRES_USER: root
POSTGRES_PASSWORD: 1234
PGDATA: /data/postgres
volumes:
- postgres:/data/postgres
ports:
- "5432:5432"
restart: unless-stopped

volumes:
postgres:

Start the Postgres container service:

docker-compose up

Next, create the “profiles.yml” file to point to your new database.

config:
send_anonymous_usage_stats: true
use_colors: true
partial_parse: false
printer_width: 250
write_json: true
warn_error: true
log_format: default
debug: true
version_check: true
fail_fast: false
use_experimental_parser: false
static_parser: true

jaffle_shop:
target: dev
outputs:
dev:
type: postgres
host: 127.0.0.1
user: root
password: "1234"
port: 5432
dbname: postgres
schema: dbt_alice
threads: 4

Inspect the state of connection to the database by running the debug command and check that the DBT has access to the Postgres instance (CLI seed ref):

dbt debug

Loads CSV files from the /seed directory into the target database to provide some data to work with (CLI seed ref):

dbt seed

Once you have a compatible database, you can run the DBT run command to execute the DBT project (CLI run ref):

This will create the database tables and load the data into the tables.

dbt run

Tests are assertions you make about your models and other resources in your dbt project (e.g. sources, seeds and snapshots). When you run dbt test, dbt will tell you if each test in your project passes or fails.
source: DBT test documentation

Verify that the project’s models are behaving as expected by running the test (CLI test ref):

dbt test

Generates HTML documentation for your project (CLI docs ref):

dbt docs generate

Serves your generated documentation locally so you can view it in your browser (CLI docs ref):

dbt docs serve

DBT docs are a great way to share and track the transformations you have made with others and keep them for future reference.

Conclusion

If you are looking for a data transformation tool that helps you transform, test, and document data in the cloud data warehouse, DBT may be the right tool for you.

--

--