The Benefits of Using DBT for Data Transformation

Published in

Israeli Tech Radar

5 min readNov 19, 2022

TL;DR DBT is a data transformation tool that offers many benefits for teams that need to quickly and efficiently transform data. DBT is user-friendly, provides a wide range of transformation options, and enables users to test and document data transformations.

Data Transformation: What it is and Why You Need It?

As data analytics become more critical for businesses in recent years, the need for tools that can help teams quickly and efficiently transform data has also grown. One such tool is DBT, which offers many benefits for teams that need to transform data.

As data warehouse and business intelligence (DW/BI) operations become more complex, the traditional approach to data transformation- ETL (extract, transform, and load) — has struggled to keep pace.
The ELT (extract, load, and transform) approach has gained popularity in recent years as an alternative to ETL, but it still relies heavily on manual code development and is often difficult to manage at scale.

There are many different types of data transformations, including data cleansing, data enrichment, data normalization, and data aggregation. Each type of transformation has its own set of challenges, which can make it difficult to choose the right tool for the job.

What is DBT?

DBT is a data transformation tool that enables data analysts and engineers to transform, test, and document data in the cloud data warehouse. DBT is designed to handle the challenges of data transformation, including data cleansing, data enrichment, data normalization, and data aggregation.

DBT provides several features that make it an ideal tool for data transformation, including a user-friendly interface, a wide range of transformation options, and the ability to test and document data transformations.

dbt - Transform data in your warehouse

dbt is a data transformation tool that enables data analysts and engineers to transform, test and document data in the…

www.getdbt.com

How DBT works

DBT works by first extracting data from its source, then loading the data into the DW as raw data, which is ready for transformation using the DBT tool. Once the data is loaded, the user can choose from a wide range of transformation options.

The DBT transforms the data already in your data warehouse. Transformations are expressed in simple SQL SELECT statements, and when executed, DBT compiles the code, infers dependency graphs, runs models in order, and writes the necessary DDL/DML to execute against your Snowflake instance.

That gives DBT an edge in performance since the data is being transformed in the data warehouse itself rather than being extracted, transformed, and then loaded back into the data warehouse.

After the transformation is complete, the user can then test the data to ensure it is accurate.

Finally, the user can document the transformation to share with others or to keep for future reference.

The Benefits of Using DBT

A user-friendly tool that offers a wide range of transformation options, making it ideal for data analysts and engineers who need to transform data quickly and efficiently.
Enables users to test and document data transformations, which helps to ensuring accuracy and to sharing knowledge with others on the data team.
An SQL-first tool, which makes it easy to use for teams that are already familiar with SQL.
DBT is designed to handle the challenges of data transformation, including data cleansing, data enrichment, data normalization, and data aggregation.

Getting started with DBT playground

If you want to explore DBT and see how it could help with your data transformation needs, you can clone the DBT demo project called “jaffle_shop” from GitHub and follow the instructions in the README file to get started.

Inside the jaffle_shop directory, you will find the following files and directories:

dbt_project.yml — this is the DBT project manifest file.
Models — this directory contains the SQL files that define the DBT models.
Seed — this directory contains the data files that are used to populate the DBT models.
Target — this directory contains the DDL files that are used to create the database tables.

To run the DBT project, you will need to have a database that is compatible with the DDL files in the target directory. The DDL files in the target directory are for a PostgreSQL database.

GitHub - dbt-labs/jaffle_shop: A self-contained dbt project for testing purposes

jaffle_shop is a fictional ecommerce store. This dbt project transforms raw data from an app database into a customers…

github.com

dbt Developer Hub

Pulling data directly out of application databases is commonplace in the MDS, but also risky. Apps change quickly, and…

docs.getdbt.com

Since the purpose of the playground is to provide a safe environment for experimentation, we gonna use Postgres DB locally instead of a cloud data warehouse.

Prerequisites:

First clone jaffle_shop repository:

git clone git@github.com:dbt-labs/jaffle_shop.git

Create a `docker-compose.yaml` file with Postgres service as follow:

version: '3.5'

services:
  postgres:
    container_name: postgres_container
    image: postgres
    environment:
      POSTGRES_USER: root
      POSTGRES_PASSWORD: 1234
      PGDATA: /data/postgres
    volumes:
       - postgres:/data/postgres
    ports:
      - "5432:5432"
    restart: unless-stopped

volumes:
    postgres:

Start the Postgres container service:

docker-compose up

Next, create the “profiles.yml” file to point to your new database.

config:
  send_anonymous_usage_stats: true
  use_colors: true
  partial_parse: false
  printer_width: 250
  write_json: true
  warn_error: true
  log_format: default
  debug: true
  version_check: true
  fail_fast: false
  use_experimental_parser: false
  static_parser: true

jaffle_shop:
  target: dev
  outputs:
    dev:
      type: postgres
      host: 127.0.0.1
      user: root
      password: "1234"
      port: 5432
      dbname: postgres
      schema: dbt_alice
      threads: 4

Inspect the state of connection to the database by running the debug command and check that the DBT has access to the Postgres instance (CLI seed ref):

dbt debug

Loads CSV files from the /seed directory into the target database to provide some data to work with (CLI seed ref):

dbt seed

Once you have a compatible database, you can run the DBT run command to execute the DBT project (CLI run ref):

This will create the database tables and load the data into the tables.

dbt run

Tests are assertions you make about your models and other resources in your dbt project (e.g. sources, seeds and snapshots). When you run dbt test, dbt will tell you if each test in your project passes or fails.
source: DBT test documentation

Verify that the project’s models are behaving as expected by running the test (CLI test ref):

dbt test

Generates HTML documentation for your project (CLI docs ref):

dbt docs generate

Serves your generated documentation locally so you can view it in your browser (CLI docs ref):

dbt docs serve

DBT docs are a great way to share and track the transformations you have made with others and keep them for future reference.

GitHub - yanivbenzvi/dbt-workshop

jaffle_shop is a fictional ecommerce store. This dbt project transforms raw data from an app database into a customers…

github.com

Conclusion

If you are looking for a data transformation tool that helps you transform, test, and document data in the cloud data warehouse, DBT may be the right tool for you.

The Benefits of Using DBT for Data Transformation

Data Transformation: What it is and Why You Need It?

What is DBT?

dbt - Transform data in your warehouse

dbt is a data transformation tool that enables data analysts and engineers to transform, test and document data in the…

How DBT works

The Benefits of Using DBT

Getting started with DBT playground

GitHub - dbt-labs/jaffle_shop: A self-contained dbt project for testing purposes

jaffle_shop is a fictional ecommerce store. This dbt project transforms raw data from an app database into a customers…

dbt Developer Hub

Pulling data directly out of application databases is commonplace in the MDS, but also risky. Apps change quickly, and…

GitHub - yanivbenzvi/dbt-workshop

jaffle_shop is a fictional ecommerce store. This dbt project transforms raw data from an app database into a customers…

Conclusion

Written by Yaniv Ben Zvi