The Modern Data Stack: An Introduction to Data Build Tools (DBT) Part 1

Agusmahari
Data Engineering Indonesia
6 min readJun 7, 2024

Powerful transformation workflow, modular, reliable analytics code

Photo by Ricardo Gomez Angel on Unsplash

In Amegakure, there is a new data engineer named Agus. With 3.5 years of experience in data engineering, Agus is known as a hardworking person in learning. He always looks for ways to make his work more efficient and achieve the desired results.

One day, Agus read an article on Medium and discovered a tool called DBT (Data Build Tool), which changed how he works as a data engineer.

https://docs.getdbt.com/docs/introduction

What is a Data Build Tool (DBT)?

DBT is a magical tool, but it’s not a magic wand. DBT helps Agus manage and modify data more efficiently. It allows him to transform raw data into useful information, making it ready for data analysts to use. As an engineer, Agus can monitor and ensure all processes run smoothly and safely using DBT. With DBT, Agus can modularize his analytical code to make it easier to develop and maintain.

Why Use Data Build Tool (DBT)?

After reading the documentation and practicing, Agus quickly realized that DBT brought many benefits to him and his team:

  • Work Faster: DBT allows Agus to modularize his analytical code into smaller, manageable parts, making his work much faster and more efficient.
  • Reduce Errors: With DBT, Agus can quickly test his data. He knows if something is wrong before the data is used, reducing the risk of errors in production.
  • Better Collaboration: Everyone on Agus’s team can work together in one place with clear rules. It’s similar to a software development team, where collaboration becomes more structured and effective.

How Does Data Build Tool (DBT)?

Of course, you need to install and set it up first, but that’s for the next article.

First, Creating Models: Agus starts by writing SQL models. A DBT model is an SQL file that defines the tables to be created. Agus creates a new table in a SQL file named models/my_first_model.sql in his project directory.

Second, Running the DBT Model: After writing his model, Agus runs it with the command dbt run. This command compiles and executes Agus’s SQL code in the database, creating tables or views in the data warehouse. Once the model is made, it can be tested within the DBT framework.

With DBT, Agus no longer feels overwhelmed by his tasks. Instead, he now enjoys every step of the data transformation process. DBT allows Agus to work more efficiently, reduce the risk of errors, and ensure that all data his team uses is always consistent and accurate. And so, Agus and his team continue to work successfully.

Features in DBT

Handle Boilerplate Code to Materialize Queries as Relations Whenever Agus creates a model in DBT, he can easily manage materializations. Think of materializations as strategies to manage data and control costs for queries. Materializations create or update tables, views, or incrementals in the warehouse. Convenient.

Use a Code Compiler DBT supports Jinja, allowing Agus to create macros to save time by defining frequently used SQL transformation functions.

Determine the Order of Model Execution When building a data mart, Agus knows it forms from staging data, then intermediate, and is gradually combined into final tables. Fortunately, DBT has a feature called ref, helping Agus select other models as data sources, ensuring everything runs in an orderly manner.

Load Seed Files In building a data mart, Agus often needs external stakeholder data that is unavailable in databases or other storage. Usually, this requires static files like CSV or Excel. These files, known as seed files, can be stored as CSV files in his project and loaded into the data warehouse using the dbt seed command. With DBT, Agus can quickly and easily integrate this data into the data mart.

Snapshot Data Sometimes, data sources from application databases can change over time, making it difficult for Agus to reconstruct historical values. DBT provides a mechanism to snapshot raw data at specific points in time, allowing Agus to track data changes over time.

With all these features, Agus finds his work more accessible and the results better. He can collaborate more effectively with his team, reduce the risk of errors, and ensure that all the data they use is consistent and accurate.

Data Build tool (DBT) products

In the DBT documentation, Agus found two DBT products: DBT Cloud and DBT Core. After reading through the documentation, Agus gathered the following information:

https://www.getdbt.com/product/dbt-cloud

DBT Cloud:

DBT Cloud is a service provided by DBT, offering a beautiful and user-friendly web interface that allows for easy and quick implementation of DBT. Through this interface, Agus can:

  • Develop models
  • Test models
  • The job scheduler is the backbone of running jobs in dbt Cloud, bringing power and simplicity to building data pipelines in continuous integration and production contexts. The scheduler frees teams from developing and maintaining their infrastructure and ensures the timeliness and reliability of data transformations.
An overview of a dbt Cloud job run
  • Visualize models
  • dbt Explorer provides a visualization of your project’s DAG that you can interact with. To access the project’s full lineage graph, select Overview in the left sidebar and click the Explore Lineage button on the page’s main (center) section.
  • If you don’t see the project lineage graph immediately, click Render Lineage. It can take some time for the graph to render, depending on the size of your project and your computer’s available memory. The graph of massive projects might not render so that you can select a subset of nodes using selectors instead.
  • The nodes in the lineage graph represent the project’s resources, and the edges represent the relationships between the nodes. Nodes are color-coded and include iconography according to their resource type.
Example of full lineage graph

However, Agus also noted a key consideration:

  • Cost: DBT Cloud has a subscription fee, which might concern Agus and his company if they have a limited budget.

DBT Core:

DBT Core is a set of open-source Python packages that allow Agus to transform data. It is suitable for those who prefer to manage DBT manually and maintain it locally. Agus can install DBT Core via the command line, which includes all the features of DBT, such as documentation and data model testing, but with the difference that it is self-developed and maintained.
dbt Labs provides different support levels for other versions, which may include new features, bug fixes, or security patches:

  • Active — We will patch regressions for new bugs and include fixes for older bugs / quality-of-life improvements. We implement these changes when we have high confidence that they’re narrowly scoped and won’t cause unintended side effects.
  • Critical — Newer minor versions transition the previous minor version into “Critical Support” with limited “security” releases for critical security and installation fixes.
  • End of Life — Minor versions that have reached EOL no longer receive new patch releases.
  • Deprecated — dbt Core versions older than v1.0 are no longer maintained by dbt Labs nor supported in dbt Cloud.
https://docs.getdbt.com/docs/dbt-versions/core

See you in the next article →

Having 3,5 years experience as an IT Developer and Data Engineer. Enthusiasts in Data Ingestion, Data Transformation & Warehouse, Data Concept, Analyze using the Google Cloud Platform, Azure Cloud, microsoft Fabric, HDFS System. and Content Writer In Medium [Link]. Currently working with several data-related technologies such Azurem Microsfot Fabroc, Hive, Python, Airflow, Kubernetes, etc. Passionate about “Cooking Raw Data” into data ready to be processed by Data Analysts to produce helpful information to assist in decision support. Learning to write on Medium as a platform for learning and sharing what I have learned.

You can connect with me on LinkedIn, Agusmahari@gmail.com.

--

--

Agusmahari
Data Engineering Indonesia

Data Enginner | Big Data Platform at PT Astra International Tbk. Let's connect on Linkedin https://www.linkedin.com/in/agus-mahari/