dbt Setup and Installation

Drew Paszek
DataLakeHouse
Published in
3 min readMar 4, 2022
High level overview of how dbt aids in the ELT process
https://www.getdbt.com/ui/img/png/analytics-engineering-dbt.png

For the uninitiated, dbt is a data analytics tool that allows one to leverage a wide array of transformation capabilities. The tool can help play a critical role in a modern data stack as it aids in performing the ‘T’ (Transform) in ‘ELT.’ It accomplishes this by way of SQL scripts that are boiled down to a single SELECT statement called “models” and provides the ability to create and utilize a number of configurations such as environment variables, tests, and sources that make data transformation as versatile, yet seamless, as possible. For more information about dbt and what makes it such a powerful data transformation tool, check out this article from dbt as well as DataLakeHouse’s most recent article about it.

Today, we are going to dig a little deeper on how to get started with dbt from a setup perspective. dbt offers a few different options on how to start leveraging its CLI solution on one’s local machine.

Homebrew

If you plan on using a MacOS machine for your local dbt development and will be connecting via PostgreSQL, Redshift, Snowflake, or BigQuery, Homebrew is one method that is available for dbt installation. If Homebrew is not already installed on your machine, you can do so here. From there, utilize the terminal, and run the following commands:

brew update
brew install git
brew tap dbt-labs/dbt

If you know which data source to which you will connect, you can then execute

brew install dbt-<source>

where is either “postgres,” “bigquery,” “redshift,” or “snowflake.” For more information about using specific dbt versions or adapter versions, refer to dbt’s official documentation.

Python/pip

For those planning to use dbt with a Windows or Linux OS, you will need to use pip. It’s highly recommended by dbt that your installation using pip be done inside of a virtual environment. This is largely a byproduct of how managing Python requirements and dependencies can tend to get messy quickly, and installing dbt in a virtual environment will allow you the ability to have an environment that is specific to dbt. The following should be performed on the command line and will create and activate a virtual environment called ‘dbt-env.’

python3 -m venv dbt-env             # create the environment
source dbt-env/bin/activate # activate the environment

Once the virtual environment is created and activated, you can execute a pip install command on the command line to install dbt.

pip install dbt-<source>

Docker

By way of dbt-core version 1.0.0, dbt now presents the ability to develop inside of a Docker container. The Docker image is now available on Fishtown Analytics’ Docker Hub. For more information on Docker and developing inside of Docker containers, refer to Docker’s official guide to setup and use.

dbt Cloud

For those who aren’t familiar with a command line interface or want a pain-free way to be able to use dbt for deployments, job scheduling, job monitoring, etc. dbt offers a Cloud solution. dbt Cloud offers the ability to develop code, schedule and run jobs, and integrate code promotion and CI/CD all in one location. Unlike dbt CLI, dbt Cloud is not open-source but does offer a Free Tier for to at least be able to use the platform on a limited basis. More information can be found on dbt’s website.

As mentioned, dbt is an incredibly versatile tool and can be made all the better when paired with a full ELT tool such as DataLakeHouse.io. DataLakeHouse.io is a no-setup, no-code ELT solution that seamlessly migrates and transforms data such that it allows you to focus more on what your data is telling you instead of agonizing over the hassle of pipelining and modeling it.

To learn more about DataLakeHouse.io, visit our site 👈

--

--