Onboarding: DBT 101

Shaojun Luo
Aug 6 · 3 min read

What’s DBT:

In Ro, we use Data Building Tools (DBT) to integrate SQL commands into data pipelines. While Snowflake’s UI is used for light-weighted analytic work. DBT is used more for the automation work to interact with remote database. Here is a demonstration of how we develop a data pipeline use DBT:

Image for post
Image for post

Typically, a DBT project is consist of several models. Each model consist of some SQL files. When executing the DBT on local machine, DBT compiles the project, generates the documents and create several executable files for server deployment. These executable files can be integrated with Airflow or other services to create ETL pipelines with scheduled tasks.

Before you use DBT:

(1) You need to understand how Git works and clone the repository dataops
(2) You should have a Snowflakes account setup.
(3) Compile the set up DBT use this instruction in dataops/buid Folder
(4) Have some idea about ETL pipeline.

Some tips:
(1) If you use Anaconda as your python source, do not use conda to create virtual environment. Use the virtualenv as instructed.
(2) If you both install Python 2 and Python 3, use pip3 to install requirements.
(3) In .bashProfile, do not use your email as SNOFLAKE_USER variable. It will cause syntax error. Use arbitrary name which can identify yourself.

Start Building a Project:

After finishing the setup, your can create a small testing project of your own to make sure everything is OK. The official tutorial may help but here we provide a more specific version for Ro’s environment:

  1. Open terminal, direct to dataops/build folder and activate DBT virtual environment.
cd dataops/build
. env/bin/activate
  1. Type the following command to create a project named ro_onboard.
dbt init ro_onboard

3. Now, you should find a new folder called ro_onboard. Open the folder using cd ro_onboard. The DBT already creates a simple project template. Open the project configuration file db_project.yml and modify the following fields:

...name: '[name_your_project]'...profile: 'rodata'...models:
[name_your_project]:
example:
materialized: view
...

Also, you can change the materialized method to table so that after compiling it will generate table instead of views.

4. Inside ro_onboard folder, execute command:

dbt run

DBT then will compile (materialized) the models and create (1) alocal folder called target with compiled SQL files that can serve as executable files for Airflow and other programs. (2) In Snowflake UI, you should see there are two tables generated under ROMAN.[SNOFLAKE_USER]_dbt schema. This is the default location of your DBT data object (you should also put all your staging data objects here).

5. Now put your own SQL queries inside models folder as .sql files. Modify the corresponding fields in db_project.yml. Re-run the project again to see if your scripts create new tables which named the same as your SQL files inROMAN.[SNOFLAKE_USER]_dbt .

If everything goes as expected, you should now be able to fully interact with the remote database. Whenever you have any questions, DBT’s doc is always a good reference.

Final Notes:

Most of Ro’s Project files are in the dataops/buid folder. Feel free to explore more advanced features and logics:

  • Project configuration is in dbt_project.yml
  • Model tags are also in dbt_project.yml
  • Models are defined in models/
  • Model documentation and schema tests are defined in models/models.yml
  • Sources are defined in models/sources.yml
  • Macros are defined in macros/
  • Analysis (queries that will be compiled but not materialized, e.g. for exports) are defined in analysis/
  • “Data tests” (as opposed to the “schema tests” mentioned above) would be defined in tests/, though this project doesn't have any yet
  • “Seed data” would be stored in data/, though this project doesn't have an yet
  • Code I wrote to help create/maintain this project, e.g. to convert models to use ref/source and to generate sources.yml, is in tasks/

If you want to check the DAGs and description and of all Ro’s models, you can run:

dbt compile
dbt docs generate
dbt docs serve

It will popup a page in web browser where you can find all the documents about these models.

Ro Data Team Blog

Ro Data Team Blog: data analytics, data engineering, data…

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store