DBT Cli Design

Chaim Turkel
Israeli Tech Radar
Published in
4 min readOct 19, 2022

DBT is a great tool for transforming data within you data warehouse.

No tool is perfect, and like any tool out there, you usually need to add your custom code, so that it fits better into your organization.

DBT code comes in two flavors. You have DBT Cloud, a SAS that will host your transformations, and schedule the running of the DAGs you created. You also have DBT Core, the open source version of DBT.

DBT core is a python library, that you can run from the cli. Currently DBT does not support a python api to run the transformations (though it is possible).

CLI

So what is missing from DBT? In the future I plan on describing the system I was part of, and the design to self serve DBT models, but for now, let’s just say I would like to have code to generate boiler plate sql code, and yaml files.

dbt cli

shell exec

When we started the project, this task was fairly straightforward. I created a python library based on poetry. This library used the click library to support a cli REPL. Now we had a tool that could do anything I wanted, and when I needed to activate DBT, I used a shell command from python to run the DBT cli.

So now I can package my python library with poetry and deploy it to an Artifactory so that anyone in the organization can install and use it.

The problem is, for it to run you must first install python and the proper versions of DBT and its addition adapters (in our case — spark).

docker

So we revisited our design and decided to package dbt into a docker. Now the cli application instead of running shell exec will now use docker run. This way we can control the version of dbt that you use, and upgrade it as necessary.

Codespace

Once we had this up and running, we decided that the best idea would be to actually host our environment on Codespace where it will include an online IDE to edit your model files.

Codespace knows how to load a docker file into its space and mount your docker as part of the online IDE.

So for this scenario we updated our docker file that currently has dbt to also include the cli package as well so that it will be integrated as a shell within Codespaces.

We now have a system that can run our python cli in multiple environments. If you run it from your local machine, a docker image will be used to run the actual dbt commands. In the case of Codespaces where a docker is already mounted in to the area, the cli will be part of the docker, and then all calls will be via shell execute commands.

Downside of Codespaces

Codespaces is a great product and enabled us to deploy a zero installation product. The downside is that it is based on visual studio code, which is a very nice product, but does not compare to PyCharm of JetBrains.

Another downside is that Codespaces is hosted on the open internet, so there are a lot of security issues that you need to deal with, and you do not want to open up your internal lan to the internet.

ECR

To give Codespaces access to AWS ECR, you need to configure an access key and secret key, so that it can download your docker image.

New version

Codespaces uses the docker image it downloaded the first time, unless you rebuild the container. We wanted to notify the user every time there is a new version so that he can rebuild the container.

The best solution for this would have been to get the latest version from an API call to Artifactory. But since the Artifactory is within the VPN this cannot be done.

Instead we gave Codespaces access to our cli git repo (within the git company name space). This way we can shallow clone the git, and get the version from the poetry file.

User configuration

Our cli has a user configuration file, that allows a one time setup, and then all the parameters are used from there. When rebuilding your Codespaces container, you loose all this data, so for this we also created a new git repo, to store all user files on a branch per user in this git.

Summary

DBT is a great tool, but needs some polishing so that it can be easily used in a company as the major tool (unless you go for dbt-cloud). To overcome this, we have created a docker to include DBT versions and adapters, and our personal enhanced cli.

Users can run DBT locally, or as we prefer to use the hosted IDE in Codespaces.

--

--