How data build tool (dbt) helped save time and increase productivity at work

Kev
3 min readApr 7, 2024

--

In my first few weeks as a data analyst, I discovered that some aspects of my day to day work actually took up a lot of time:

A. Part of the day to day work involved investigating why the aggregated numbers in two reports looked different even though they were reporting similar metrics. The outcomes of these investigations were often due to people having different definitions of how the numbers should be aggregated, or using different data tables to pull the data.

B. Writing SQL queries using the data tables directly ingested from the backend (into the data warehouse) often involved a lot of cleaning and re-casting of data types. This made the queries long and difficult to understand, and they also took a longer time to write.

That’s when data build tool (dbt) came to the rescue. This tool proved to be a game-changer, as it fundamentally altered our data pipeline from the traditional ETL (Extract, Transform, Load) to ELT (Extract, Load, Transform). This paradigm shift empowered us, as analysts, to exert greater control over data transformation processes.

Explaining the ETL and ELT Pipeline:

ETL pipeline

Extract:

Data is collected from various sources such as databases, files, or applications.

Transform:

Data is cleaned, standardized, and structured to fit a predefined format. For example, dates might be formatted uniformly, or different units of measurement might be converted to a common standard.

Load:

Transformed data is loaded into a data warehouse or data mart, where it’s stored and organized for future analysis.

The ETL data pipeline. Source: dbt

ELT pipeline

Extract:

  • In the ELT process, extraction remains the same as in ETL. Data is still gathered from various sources.

Load:

  • ELT flips the order of transformation and loading. Instead of transforming data before loading it into the data warehouse, raw data is loaded directly into the data warehouse.

Transform:

  • After data is loaded into the data warehouse, transformation occurs within the data warehouse or data mart itself.
The ELT data pipeline. Source: dbt

The benefits of dbt were manifold:

  • Control Over Transformation: ELT allowed us to handle data transformation within our data warehouse, ensuring that we could tailor it precisely to our stakeholders’ needs. This newfound control was instrumental in aligning our data outputs with the expectations of end-users.
  • Version Control and Software Engineering Best Practices: With the integration of version control, continuous integration/continuous deployment (CI/CD), and testing, our data transformation processes became more robust and scalable. This not only enhanced the reliability of our analyses but also democratized the ELT process, enabling anyone proficient in SQL to contribute effectively.

As a result, we established a clean data mart layer that served as the single source of truth for end-users. This not only eliminated the headaches associated with transforming data downstream, but also significantly reduced the need to do report investigations , saving us precious time and resources.

P.S. these are my subjective experiences of some challenges I faced in my new role. I understand that these challenges may not be common to everyone, but I hope this provides some form of inspiration and guidance to anyone who is facing similar challenges

--

--

Kev

A mid-career switcher into the world of data, still learning and growing everyday