Installing Airflow, DBT, Python with Docker

Komal Parekh
4 min readOct 8, 2021

--

This was something I was working on past 2–3 weeks. This was my first try on docker. There are many articles on docker as well on installing airflow with docker. However, I wanted to have airflow, dbt and python in docker and I really could not come across many article. One article that I have followed is below — ‘https://analyticsmayhem.com/dbt/apache-airflow-dbt-docker-compose/

My code is modification to the one mentioned in the article.

So lets get started!

What is docker and how to install it?

Docker is a software framework for building, running, and managing containers on servers and the cloud. The term “docker” may refer to either the tools (the commands and a daemon) or to the Dockerfile file format.

Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services. Then, with a single command, you create and start all the services from your configuration

I have used docker compose to spin up airflow and DBT.

I have installed docker desktop on my windows 10 laptop and docker compose comes as part of it so no need for any additional installation.

Once you have installed docker desktop, you can open it and it will look below screen —

Lets Spin up Airflow and DBT in your Docker!

  1. Create a folder in your drive — I have named it ‘dbt-airflow-docker-compose’

2. Create a docker-compose.yml file in the folder.

version: '3.8'services:postgres:image: postgresenvironment:POSTGRES_PASSWORD: pssdPOSTGRES_USER : airflowuserPOSTGRES_DB : airflowdbAIRFLOW_SCHEMA: airflowexpose:- 5432restart: alwaysvolumes:- ./scripts_postgres:/docker-entrypoint-initdb.d# - ./sample_data:/sample_datapostgres-dbt:image: postgresenvironment:POSTGRES_PASSWORD: pssdPOSTGRES_USER : dbtuserPOSTGRES_DB : dbtdbDBT_SCHEMA: dbtDBT_RAW_DATA_SCHEMA: dbt_raw_dataexpose:- 5432restart: alwaysvolumes:- ./sample_data:/sample_dataairflow:build: .restart: alwaysenvironment:DBT_PROFILES_DIR: /dbtAIRFLOW_HOME: /airflowAIRFLOW__CORE__DAGS_FOLDER: /airflow/dagsAIRFLOW__CORE__PARALLELISM: 4AIRFLOW__CORE__DAG_CONCURRENCY: 4AIRFLOW__CORE__MAX_ACTIVE_RUNS_PER_DAG: 4# AIRFLOW__ADMIN__HIDE_SENSITIVE_VARIABLE_FIELDS: False# Postgres details need to match with the values defined in the postgres servicePOSTGRES_USER: airflowuserPOSTGRES_PASSWORD: pssdPOSTGRES_HOST: postgresPOSTGRES_PORT: 5432POSTGRES_DB: airflowdb# postgres-dbt connection details. Required for the inital loading of seed data# Credentials need to match with service postgres-dbtDBT_POSTGRES_PASSWORD: pssdDBT_POSTGRES_USER : dbtuserDBT_POSTGRES_DB : dbtdbDBT_DBT_SCHEMA: dbtDBT_DBT_RAW_DATA_SCHEMA: dbt_raw_dataDBT_POSTGRES_HOST: postgres-dbtSNF_UNAME: ${SNF_UNAME}SNF_PWD: ${SNF_PWD}depends_on:- postgres- postgres-dbtports:- 8000:8080volumes:- ./dbt:/dbt- ./airflow:/airflow- ./python:/python- ./datafiles:/datafilesadminer:image: adminerrestart: alwaysports:- 8080:8080depends_on:- postgres- postgres-dbt

3. Create folders to hold your airflow python, DBT and data files —

In order to be able to save (persist) data and also to share data between containers, Docker came up with the concept of volumes. Quite simply, volumes are directories (or files) that are outside of the default Union File System and exist as normal directories and files on the host filesystem.

So, create folders — datafiles,dbt,python, airflow. Also create sub directory dags under airflow.

4. Create a folder scripts_airflow with file init.sh file. File, init.sh should hold below information

#!/usr/bin/env bash# Setup DB Connection String
AIRFLOW__CORE__SQL_ALCHEMY_CONN=”postgresql+psycopg2://${POSTGRES_USER}:${POSTGRES_PASSWORD}@${POSTGRES_HOST}:${POSTGRES_PORT}/${POSTGRES_DB}”
export AIRFLOW__CORE__SQL_ALCHEMY_CONN
AIRFLOW__WEBSERVER__SECRET_KEY=”openssl rand -hex 30"
export AIRFLOW__WEBSERVER__SECRET_KEY
AIRFLOW__CORE__EXECUTOR=LocalExecutor
export AIRFLOW__CORE__EXECUTOR
DBT_POSTGRESQL_CONN=”postgresql+psycopg2://${DBT_POSTGRES_USER}:${DBT_POSTGRES_PASSWORD}@${DBT_POSTGRES_HOST}:${POSTGRES_PORT}/${DBT_POSTGRES_DB}”cd /dbt && dbt compile
rm -f /airflow/airflow-webserver.pid
sleep 10
airflow upgrade db
sleep 10
airflow connections — add — conn_id ‘dbt_postgres_instance_raw_data’ — conn_uri $DBT_POSTGRESQL_CONN
airflow scheduler & airflow webserver

4. Create a folder scripts_postgres. Create a file ‘init-user-db.sh’ with below content

#!/bin/bash
set -e
psql -v ON_ERROR_STOP=1 — username “$POSTGRES_USER” — dbname “$POSTGRES_DB” <<-EOSQL
ALTER ROLE $POSTGRES_USER SET search_path TO $AIRFLOW_SCHEMA;
— CREATE SCHEMA IF NOT EXISTS $DBT_SCHEMA AUTHORIZATION $POSTGRES_USER;
— CREATE SCHEMA IF NOT EXISTS $DBT_SEED_SCHEMA AUTHORIZATION $POSTGRES_USER;
CREATE SCHEMA IF NOT EXISTS $AIRFLOW_SCHEMA AUTHORIZATION $POSTGRES_USER;
SET datestyle = “ISO, DMY”;
EOSQL

5. Create a docker file ‘dockerfile’

FROM python:3.7
RUN pip install ‘apache-airflow[postgres]==2.1.3’ && pip install dbt==0.19
RUN pip install SQLAlchemy==1.3.23
RUN mkdir /project
COPY scripts_airflow/ /project/scripts/
RUN chmod +x /project/scripts/init.sh
ENTRYPOINT [ “/project/scripts/init.sh” ]

6. Time to up our docker — Open your command prompt and CD to location where you have done your setup and run below command

docker-compose.yml up

Once docker is up, in desktop docker you see your container!

Code like a BOSS!

--

--

Komal Parekh

Senior Cloud Data Engineer | Business Intelligence | Datawarehousing