DBT — Project Structure

Chaim Turkel
Israeli Tech Radar
Published in
4 min readOct 19, 2022

So you are now starting your first DBT project. I assume you have already gone over the jaffle_shop project to see how a dbt project looks like. In addition you should have had a look at the best practices page.

Looking a bit forward, you know that your company needs a more complex hierarchy to the project, but you are not sure how to structure your project.

There is always the debate whether to create a single mono-repo, or separate git repos. There are pros and cons for each one, for a summary see one or many.

The basic building blocks of every dbt project are:

  1. The model (sql file)
  2. The model meta data (yml file)
  3. The model documentation data (md file)

In addition, since dbt is made for transformation you know also that you will have the following layers

  1. source (bronze)
  2. staging (silver)
  3. marts (gold)

I will share with you our best practices that we have build during the last year at Yotpo.

To describe the whole system we will start from the smaller parts and work our way up.

Model

Each model needs to have sql & yaml and documentation. For this we suggest to create a folder for each model, and have a yml and md file for each model. This makes the project very organized:

Model Layers

Since we are in the ETL business, we want to separate each layer of our datalake.

Source / Bronze / Private

Source models are models that we do not create but actually import from an existed system. For this we will put the yaml files in the source directory.

Staging / Silver / Protected

This layer will do the renaming of columns for a single standard. If needed we will cast columns also so that we have a single standard in our system. If needed we can also filter out all test and debug columns and rows.

In the staging layer we usually have two layers. The base layer is the layer that should just clean up the source table and nothing else. If you have more business logic, or if you want to join this table with another, then the model should be in the marts_compatible folder.

Marts / Gold / Public

This layer is the contract layer. Anything in this layer is exposed to anyone that wants to use the data. Each model here, should be the final product after careful thinking of the needs. You should try as much as possible not to change this layer once deployed, since others will be dependent on it.

Exposures

Exposures make it possible to define and describe a downstream use of your dbt project, such as in a dashboard application. All our exposures we but in a folder of exposures, and then have a sub folder of the type of exposure.

Since all objects in dbt can have metadata, we found it very organized to create exposures for data that we want to export outside of dbt.

So for example if we automatically export (based on tags) models to looker, we will create an exposure for each looker report.

Another example, could be scheduling. If we want to group models to decide when the orchestrator should run then, we would group them into an exposure with the details of the schedule in the exposure.

Models Summary

We now have organized all our models in a way that it is very simple to understand what is where, and what layer in our datalake is the model.

Domains

Most companies have domains within the company, and the data usually is locally to domains.

To solve this issue we have the following structure and rules.

The root folder is the domain name, and you can have one more level of a sub-domain. Under the domain folder or sub-domain folder you will have the full model structure we described above.

The rules that need to be followed are:

  • You cannot reference a source (private) or stage (protected) from another domain. This way any mistakes or changes can be done locally without effecting anyone else
  • All Scheduling will also be done on the domain level. Within airflow, we will not have two domains in one DAG. You can have a sensor between DAGs, so you can have dependencies.
  • Interconnections between domains should be at a minimum, and there should be no cycles between domains. If a need arises for a cycle, then the domain needs to be split.

Summary

As you can see, building a big project in DBT can be difficult, but if you follow some basic guidelines, your project will be maintainable even if it gets very big.

--

--