dbt Mesh: Powering Data Mesh — The Ultimate Guide

Alice Bui
Joon Solutions Global
6 min readMar 19, 2024

While dbt is a powerful tool for data transformation, dbt Mesh unlocks its full potential within the Data Mesh architecture. This comprehensive guide delves into both concepts. We’ll explore how dbt Mesh builds upon the core principles of Data Mesh, empowering domain-specific data ownership and collaboration. This guide also identifies who can benefit the most from this approach, walks you through the implementation process, and provides solutions for potential challenges.

What is Data Mesh? What problems does it try to solve?

Pain point of Data teams

In the data monolith approach, a single team often handles all of the stages from ingestion, processing, and serving.

Data Monolith Approach — Image by the author

This approach works well on a small scale but will break down on a larger scale. Maintainability is painful for the central data team:

  • Hundreds of PRs are waiting to be approved at the end of the day >> Heavy workload for the central team
  • The more models, the longer the CI/CD run time
  • Higher chance of code conflicts

Furthermore, monolithic systems rarely have clear contracts or boundaries. This means that data formatting changes upstream can break an untold number of downstream consumers.

Data Mesh principles

Data mesh was born to solve all the problems above. A data mesh is a decentralized data management architecture comprising domain-specific data.

Data Mesh Architecture — Source: dbt Labs

In a data mesh framework, it enacts the following principles:

Data Mesh Principles —Source: dbt Labs

Why is dbt Mesh the ideal match to Data Mesh?

dbt Mesh allows you to operationalize data mesh better. dbt Mesh isn’t a single feature, is a pattern enabled by a convergence of several features in dbt:

  • Along with 1st principle: Domain-driven, dbt has Cross-project references that help you to separate your data into domain-driven projects
Example of dbt Multi-Projects — Source: dbt Labs

Availability

Who can benefit the most from dbt Mesh?

Scenarios & How dbt Mesh can solve — Image by the author

From these observations, I think data mesh brings a bright future, particularly for:

Businesses that have complicated or fast-changing domains/ business lines (e.g., supply chain, logistics, e-commerce, etc.)

Businesses that have sensitive or expensive data need to be isolated (e.g., banking, financial services, etc.)

Businesses with a decentralized structure of data teams

How dbt Mesh mechanism works

Since DBT mesh is a new way of working, it could pose a lot of challenges to adopting it. Let’s dive into how the dbt Mesh mechanism works behind the scenes so that we can develop a better plan implementation.

Cross-project collaboration

  • Project dependencies: are acyclic. For example, if project B depends on project A, a new model in project A could not import and use a public model from project B.
  • Upstream project maintenance: If the maintainers of the upstream project wish to remove the model (or change its access modifier), this would be a breaking change for downstream consumers of that model. They should mark that model for deprecation (using deprecation_date), which will deliver a warning to all downstream consumers of that model.
  • Triggering upstream models in other projects: If you run `dbt build — select +model`, it will not trigger a run of upstream models in other projects unless downstream projects are installed as packages (source code).
  • Orchestrate job runs across multiple projects: dbt Cloud will soon offer the capability to trigger jobs when completing another job, including a job in a different project.

Permissions & access

  • Role-based access control (RBAC): dbt Cloud Enterprise plans support role-based access control (RBAC), which manages granular permissions for users and user groups. You can control which users can see or edit all aspects of a dbt Cloud project.
  • Model access: defines where models can be referenced. Models with public access can be referenced everywhere. Models with protected access can only be referenced within the same project. Models groups enable more granular control over where private models can be referenced.
  • Maintaining visibility on the entire organizational DAG: A central data team member can have permissions (at least read-only access) on all projects in a dbt Cloud account, they can navigate across the entirety of the organization’s DAG in dbt Explorer, and see models at all levels of detail.

High-level decision when implementing

To adopt dbt Mesh, you’ll need to consider these high-level areas:

  • Splitting projects: How do you determine where to split your DAG? Which models go in which project?
  • Git strategy: Mono-repo (multiple dbt Projects living in the same repository) or Multiple repos (one repo per project)?

Splitting projects

We can use this information to inform our decision to split our project apart.

  • Examine your jobs — which sets of models are most often built together?
  • Look at your lineage graph — how are models connected?
  • Look at your selectors defined in selectors.yml - how do people already define resource groups?
  • Talk to teams about what sort of separation naturally exists right now.
3 ways to split your projects — Image by the author

Git strategy

  • Small-to-medium-sized team: mono-repo setup
  • Large team: multi-repo setup

Solutions for potential challenges

While dbt Mesh offers a powerful approach to data management, there are some potential roadblocks to consider during implementation.

  • Shifting Mindset: Moving from a centralized data team to a decentralized model requires a cultural shift, with domain teams needing to embrace data ownership and collaboration.
  • Monitoring and Observability: With data spread across multiple projects, monitoring data pipelines and identifying potential issues can be difficult.
  • Standardization and Governance: Decentralization can lead to inconsistency in data quality, coding practices, and documentation.

The challenges mostly lie in the process, not the technical part. Good news! We’ve got a plan for you! It includes, but is not limited to:

  • Team structure: Setting up 3 major teams and areas of responsibility that can cover dbt Mesh architecture
Data mesh team structure — Image by the author
  • Team Process: Best practices in managing access across models, maintaining upstream models; CI/CD workflow and coding best practices.

Want to delve deeper into dbt Mesh best practices? Let’s chat!

--

--

Alice Bui
Joon Solutions Global

Analytics Engineer @ Joon Solutions | dbt, Looker, Airflow Certified