Schedule DBT models with Apache Airflow using the Docker container
This blog offers guidelines on utilizing a Docker container to trigger DBT models with Apache Airflow.
Below are the steps to be followed to trigger DBT models,
Step 1: To install DBT on the local system.
The command for installing DBT in CLI:
pip install DBT-snowflake
Check DBT Version :
DBT --version
Link to install DBT: Install with pip | DBT Developer Hub (getDBT.com)
Step 2: After that, install Docker on the local system.
Link to install Docker: Get Docker | Docker Documentation
Step 3: Navigate to the appropriate folder to create a DBT project, use the command below.
DBT init DBT_sample
While creating a DBT project, you have to give your snowflake credentials: account URL link, database, schema, role, username, and password, all of which are mandatory to specify.
Step 4: Open the project you just created in Visual Studio Code.
Use the command below to verify the connection test,
DBT debug
Step 5: Then make a single model inside the folder containing model files, and run it in the VS code console.
Create a root folder inside the project directory
Step 6: Open Docker to create an account
Step 7:Create a Dockerfile, such as docker-compose.yml and requirements.txt, and then configure Docker by putting it into it.
Docker file creation: New-Item (filename) in PowerShell
docker-compose.yml:
version: "3.10"
services:
web:
build:./
requirements.txt:
DBT-core
DBT-snowflake
Dockerfile :
FROM python:3.10.5
WORKDIR /
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
COPY . .
# path destination
RUN dbt clean --project-dir /
RUN dbt deps --project-dir /
CMD ["/bin/bash", "-c", "${cmd}"]
EXPOSE 8080
Step 8: After that, run the model within the docker
For a Docker build, use the command below ,
docker build -t (filename for doc) -f foldername . /
Example : In this case, use the folder name Dockerfile and the filename dbt.
docker build -t dbt -f Dockerfile ./
For a Docker run, use the command below,
docker run -e cmd='dbt run --project-dir /' dbt (image name)
Step 9: Now start the airflow implementation by creating a DAGs folder inside the project.
The commands.sh and DBT airflow.py files should be created inside the dags folder.
DBT_airflow.py:
Tasks can be scheduled in this file. Hourly scheduling is used here.
from airflow import DAG
import pendulum
from airflow.operators.bash import BashOperator
with DAG(
dag_id='run_DBT',
description='First DAG',
schedule_interval= '@hourly',
start_date=pendulum.datetime(2022, 10, 18,tz="UTC")) as dag:
task = BashOperator(task_id = 'task_run',bash_command='/commands.sh',dag=dag)
task# Here run_DBT is dag name
commands.sh:
cd /
DBT run --project-dir /
To change the requirements.txt file and the Dockerfile ,
Dockerfile :
FROM python:3.10.5
WORKDIR /
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt
COPY . .
COPY /dags /root/airflow/dags
# path destination
RUN DBT clean --project-dir /
RUN DBT deps --project-dir /
RUN airflow db init
ENV cmd="airflow webserver"
CMD ["/bin/bash", "-c", "${cmd}"]
EXPOSE 8080
Requirements.txt :
DBT-core
DBT-snowflake
apache-airflow
Step 10: There must be accounts set up for airflow.
Link to Airflow Account creation : Webserver — Airflow Documentation (apache.org)
Go docker container ->open terminal
# create an admin user
airflow users create \
--username admin \
--firstname Peter \
--lastname Parker \
--role Admin \
--email spiderman@superhero.orgairflow scheduler
Step 11: Open the browser through the docker
To go to the Airflow login page, click here: localhost:8080
Then move to the Airflow DAGS page ,
You need to look for your dag within the dags.
Now try to trigger the dag ,
Once the dag has been triggered, it will execute as depicted below,
Finally, using a Docker container and Apache Airflow, constructed DBT models are activated. It can be either run manually or using any trigger for the automatic run.
About Us :
Bi3 has been recognized for being one of the fastest-growing companies in Australia. Our team has delivered substantial and complex projects for some of the largest organizations around the globe and we’re quickly building a brand that is well-known for superior delivery.
Website : https://bi3technologies.com/
Follow us on,
LinkedIn : https://www.linkedin.com/company/bi3technologies
Instagram : https://www.instagram.com/bi3technologies/
Twitter : https://twitter.com/Bi3Technologies