dbt cheat sheet: 17 dbt Commands You Should Start Using Today

Bruno Souza de Lima
Indicium Engineering
12 min readJul 22, 2022

Trying to remember a specific dbt command or find the one that serves your needs? Save your time by using this dbt cheat sheet.

As you all may know, dbt is a very powerful tool for data transformation, if you don’t know what it is, you should definitely read about it https://docs.getdbt.com/docs/introduction.

dbt has lots of features to make analytics engineers’ lives easier. However, to unlock the full power of this tool and to use it efficiently, you have to examine its documentation carefully and practice a lot. When you read the dbt docs you can find new features that solve in an obvious way the problems you have been facing, or even realize you could have been using better transformation strategies.

To make this learning path become a bit lighter, I made a cheat sheet with the main commands and flags you should use in your transformations with dbt. I hope it will help people learn about features they don’t know dbt provides, or even deliver experienced professionals a place to remember commands faster, instead of having to spend minutes looking for it in the documentation.

In this post, I will explain in more detail the commands and arguments present in this cheat sheet. The descriptions are based on the documentation available at https://docs.getdbt.com/, most examples are taken from there. If you want to have more information about a specific topic, I strongly recommend reading the online docs.

So, this is the cheat sheet (you can download the pdf version by the end of the post or clicking here)

Figure 1 — dbt commands cheat sheet

Now let’s dive into it!

Commands and Flags

The upper part of the cheat sheet shows the main commands of dbt, and some very useful flags. Although some commands’ descriptions are very explanatory, I want to make comments about some of them.

Main commands

Some commands are very well known and there is not much to add to the description:

  • dbt init: Initializes a new dbt project.
  • dbt run: Runs all models within the project.
  • dbt test: Tests all tests within the project.
  • dbt snapshot: Executes snapshots in the snapshots-paths defined in the dbt_project.yml file.
  • dbt seed: Loads csv files found in the seed-paths defined in the dbt_project.yml file.
  • dbt build: dbt run + dbt test + dbt snapshot + dbt seed (in DAG order).
  • dbt deps: Downloads dependencies listed in the packages.yml file.

Now things start to get a bit more interesting. Let me talk about these ones:

  • dbt clean: This command is helpful in situations where, for any reason, you have to delete the same folders frequently. Just list them in the clean-targets list inside the dbt_project.yml file and let dbt clean do this work for you.
  • dbt compile: Have you ever done a “dbt run” and got surprised by compilation errors in some models? You can try “dbt compile” before “dbt run” to catch those errors, this will save you time.
  • dbt debug: Shows some useful information about your machine config, such as python and dbt versions, python path, OS info, paths of profiles.yml and dbt_project.yml. It tells you if the profiles.yml and dbt_project.yml were found and if they are valid. It also gives information about the connection and tests it, and informs you if the required dependencies were found.
$ dbt debug
Running with dbt=0.20.2
dbt version: 0.20.2
python version: 3.8.10
python path: /home/#####/#####/venv/bin/python
os info: Linux-5.15.0-41-generic-x86_64-with-glibc2.29
Using profiles.yml file at /home/#####/.dbt/profiles.yml
Using dbt_project.yml file at /home/#####/#####/#####/dbt_project.yml
Configuration:
profiles.yml file [OK found and valid]
dbt_project.yml file [OK found and valid]
Required dependencies:
- git [OK found]
Connection:
method: service-account-json
database: #####
schema: #####
location: None
priority: None
timeout_seconds: #####
maximum_bytes_billed: #####
Connection test: OK connection ok¬
  • dbt list: It lists all resources of the project. It is very useful for large projects or for cases when you don’t have access to a graphic interface. It gives cleaner results when used together with node selection (I will talk about node selection soon).
  • dbt parse: Parses the project. It will check if your project is correctly structured. It will also give performance time information. This information can be used to optimize the building time of large projects.
  • dbt source: This command has two available subcommands, “dbt source snapshot-freshness” (old version) and “dbt source freshness” (new version). It gives you information about the freshness of your source tables, for example, when it was the last time the source was updated. You can also configure warnings to raise if the source data is too old.
  • dbt rpc: Can be used to run a Remote Procedure Call (rpc) dbt server. I don’t want to get into details about it, just warn you it will be fully deprecated by the end of 2022. So if you use it, be warned. If you are reading it after 2022, you can ignore the command.
  • dbt run-operation: Great command to test macros. If you don’t use macros, you should totally consider using them.
  • dbt docs: One of the most useful commands! That is why I dedicated a space in the cheat sheet for it. It generates your project’s documentation. Documentation is extremely important, especially in large projects! The command can be run in two forms, “dbt docs generate” and “dbt docs serve”. The first subcommand generates your project’s documentation website and can be used with the –no-compile flag to skip re-compilation. The second subcommand generates documentation locally.
Figure 2 — Main commands
Figure 3 — dbt docs

Global and other useful flags

The global flags shown in the cheat sheet are quite self-explanatory:

  • –version: Shows the installed version and latest version of dbt. It also shows the version of the plugins.
Installed version: 0.20.2
latest version: 1.0.0
Your version of dbt is out of date! You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation
Plugins:
- bigquery: 0.20.2
- redshift: 0.20.2
- postgres: 0.20.2
- snowflake: 0.20.2

Yes, I have to update my dbt :)

  • –record-timing-info: It saves profiling information to a file and it can be visualized using snakeviz.
Figure 4 — Global CLI Flags

Now I want to talk more about other flags I found tremendously handy:

  • –help: This is gold! Help flags are always good. It can be used with almost all commands, describes them, and gives instructions on how to use them.
  • –vars: Another great flag for code generalization. Allows you to supply variables to dbt models.
  • –full-refresh: If you run models incrementally (and you should do it for cost optimization), this command lets you run the model in full refresh mode.
  • –fail-fast: If dbt encounters an error in dbt run or fails a test in dbt test, it will stop the execution and it will not run the rest of the models/tests.
  • –threads: Allows you to specify the number of threads. Nice to reduce execution time, but be sure you will not cause any problems by increasing the number of threads.
Figure 5 — Other flags

Node selection

A node in dbt is nothing more than a resource in a dbt’s DAG (directed acyclic graph). Thus, It can be a model, test, source, seed, snapshot, exposure, or analysis. The DAG below, for example, is composed of sources, models, and exposures, and all of them are considered nodes.

Figure 6 — Example of a dbt’s DAG. Source: https://github.com/dbt-labs/dbt-project-maturity

I already showed you commands to interact with nodes. For instance, the “dbt run” command will run all the models of your project. Unless you really need to run all the models of your project, doing this may not be a good idea in terms of processing time and costs. Fortunately, dbt provides you with a handful of ways to specify which nodes to select.

Syntax overview

First of all, let’s take a look at the syntax overview section of the cheat sheet. There are specific commands that work with node selection, and each of them has its own list of acceptable arguments.

Figure 7 — Syntax overview

Let’s have a brief description of each argument.

  • –select (-s): The most common and, if I may say, important node selection flag. This flag is used to specify the nodes you want to include in your selection list. In earlier versions of dbt, the flag –models (-m) were used. The –models flag is still supported.
  • –exclude: Using this flag you can specify which nodes you want to exclude from your command.
  • –selector: Allows you to specify selectors described in the selectors.yml file.
  • –resource-type: This lets you limit the selection by a resource type.
  • –defer: Defer is a more complex flag, as it is written in dbt documentation “Defer is a powerful feature that makes it possible to run a subset of models or tests in a sandbox environment, without having to first build their upstream parents.” For more details check https://docs.getdbt.com/reference/node-selection/defer.

In this post I will focus mainly on the –select flag and also –exclude flag, because they have more semantic options and are by far the most used ones.

Specifying resources with “–select”

As I said, –select is by far the most used flag for node selection, and it accepts several types of arguments.

For the sake of simplicity, I will show examples using the “dbt run” command. Don’t forget, however, the –selection flag can be used with the other commands shown in the “Syntax Overview” section.

As described in the cheat sheet, you can specify a

  • package’s name: dbt will run all the models in the package/project.
  • model name: dbt will run the specific model.
  • fully-qualified path to a directory: dbt will run all the models inside the directory.
  • selection method: dbt will run all the models that match with the selection method. The selection methods will be described in the following section.

More examples can be seen below in the image, taken from the cheat sheet, below.

Figure 8 — Specifying resources

Selection methods

Selection methods are one type of argument for node selection. These methods allow dbt to run commands, such as “dbt run” specifying methods sharing a common characteristic. When you use a selection method, you must insert a colon (:) after the method name/key and the method value, with no spaces. The possible selection methods are shown in the cheat sheet.

Figure 9 — Methods

Let’s take a look at each method’s description, I will assume, again, we are using the “dbt run” command. And since the cheat sheet doesn’t show examples of each method, I will write them here:

  • tag: Specifying a tag, dbt will run all models associated with that tag.
    - $ dbt run –select tag:monthly
  • source: Using the source method, dbt will run all models that select from that source. It has to be used with the plus (+) operator.
    - $ dbt run –select source:orders+
  • path and package: The path method can be used to specify a path, and the package method a package. It is not mandatory to use these methods, as seen in the ‘Specifying resources with “ — selection”’ section. You can use them to make your command more verbose.
    - $ dbt run –select path:path/to/my/model
    - $ dbt run –select package:my_package
  • config: Specifying a config, dbt will run all models having that config. The config key is separated from “config” using a dot (.), and the colon separates the config key and value.
    - $ dbt run –select config.materialized:incremental
  • test_type: Test type is used with the “dbt test” command. It can have one of two arguments, singular or generic, that specify the type of the test will want to execute.
    - $ dbt run –select test_type:generic
    - $ dbt run –select test_type:singular
  • test_name: Test type is used with the “dbt test” command. This method allows you to execute all tests with a specific generic name, such as unique and not_null.
    - $ dbt run –select test_name:not_null
  • state: Allows you to run only new nodes (using “new” value) or modified ones (using “modified” value). Dbt knows if the node is new by looking at the manifest file. You also can make subselections on modified nodes (check https://docs.getdbt.com/reference/node-selection/methods#the-state-method)
    - $ dbt run –select state:new
    - $ dbt run –select state:modified
  • exposure and metric: Specifying an exposure or a metric, dbt will run the parent's resources of that exposure or metric. It has to be used with the plus (+) operator left-sided.
    - $ dbt run –select +exposure:my_exposure
    - $ dbt run –select +metric:my_metric
  • result: Can be used to select only resources which resulted in errors (using “error” value) or failures (for tests, using “failure” value) on the prior execution. You can, for example, run only the models which have raised errors on the prior dbt run.
    - $ dbt run –select result:error
  • - $ dbt test –select result:fail
  • source_status: Used to execute resources based on the freshness of the related source. Check https://docs.getdbt.com/reference/node-selection/methods#the-source_status-method.

Graph operators

Graph operators are operators which can be used along node selection arguments. For the examples of this section, consider the DAG shown when I described a node in “Node selection”.

Figure 10 — Example of a dbt’s DAG. Source: https://github.com/dbt-labs/dbt-project-maturity

The available graph operators are:

  • Plus operator (+): Using this operator, the selection extends to parent resources (if the plus operator is left-sided) or to children resources (if the plus operator is right-sided).
    - $ dbt run –select +int_billed_claim_amounts
Figure 11 -Current model and all parents are selected

- $ dbt run –select int_billed_claim_amounts+

Figure 12 — Current model and all children are selected

- $ dbt run –select +int_billed_claim_amounts+

Figure 13 — Current model and all parents and children are selected
  • N-Plus operator: Similar to plus the operator, but now you specify the degree of the parents or children.
    - $ dbt run –select 1+int_billed_claim_amounts+1
Figure 14 — Current model and parents to the first degree and children to the first degree are selected
  • At operator (@): Similar to the plus operator, but will also run the parents of the children of the specified node.
    - $ dbt run –select @int_billed_claim_amounts
  • Star operator (*): Putting this operator at the end of a path, dbt will execute all nodes in this path.
    - $ dbt run –select path.to.models.*

More examples can be seen in the cheat sheet.

Figure 15 — Graph operators

Set Operators

Set operators can be of two types, unions, and intersections.

  • Unions: Unions are used to execute more than one selection at a time. Using “dbt run” as an example, you can run more than one selection separating them with a blank space.
    - $ dbt run –select +snowplow_sessions +fact_orders
    It will run “+snowplow_session” AND “+fact_orders”, remember the plus (+) operator runs the parent nodes, so this command will run both models and their ancestors.
  • Intersections: Intersections are used to execute common resources to more than one selection. The selections are comma-separated (,). Using “dbt run”, for instance, dbt will take the models of each selection and run only the ones that appear in all selections.
    - $ dbt run –select +snowplow_sessions,+fact_orders
    It will run the common ancestors of“snowplow_session” and “fact_orders”.
    - $ dbt run –select marts.finance,tag:nightly
    It will run models in marts/finance folder AND with the tag nightly, simultaneously.
    - $ dbt test — select +fct_orders,test_type:generic
    It will run generic tests of the fct_orders and its ancestors.

The space operator is very useful to run more than one model without needing to divide them into different commands. And the comma operator is very convenient if you want to use more than one specifying methods, or combination of selections.

Figure 16 — Set operators

Excluding models

All of the semantics applied to the –selection flag can be applied to the –exclude flag. Instead of inserting the selected resources in a set that will be executed, the –exclude flag will select resources to remove from this set.

It is a very handy flag, especially if you want to select a large set of resources, but remove some resources that share a common characteristic.

  • $ dbt run –select my_folder.* –exclude tag:daily
    It will run all my models inside my_folder folder, except the ones tagged daily.
Figure 17 — Excluding models

Conclusion

Now I hope you can fully understand the cheat sheet. Maybe you learned some useful commands in this post, if you already knew about all of them, at least the cheat sheet can serve you to remember them. If you are not satisfied with some explanation, I strongly recommend checking dbt documentation. Feel free to share the cheat sheet with others analytics engineers.

Thank you for your time!

Cheat sheet in PDF:
https://github.com/bruno-szdl/cheatsheets/blob/main/dbt_cheat_sheet.pdf

dbt docs:
https://docs.getdbt.com/

--

--

Bruno Souza de Lima
Indicium Engineering

https://www.linkedin.com/in/brunoszdl/ #dbt #sql #snowflake #bigquery #databricks #analytics #analyticsengineer #data #elt