Discover Apache Airflow by setting up a working environment on Ubuntu with PostgreSQL and SSL certificate

Jean-Michel ARNAUD
4 min readJun 25, 2024

--

In this story, we will start with a fresh Ubuntu server and, step by step, get an Apache Airflow instance up and running. Like a recipe for a cake, we will use a series of elementary actions and try to avoid nothing to get a good cake in the end.

{0} Ingredients

First, we need a virtual machine. We bought one on OVH cloud with Ubuntu 24.04 on it, after our purchase, we receive an IP (hereafter 123.123.123.123), an account (ubuntu), and a password (password).

Secondly, buy a domain name (same provider for us, OVH). Let’s say : mydomain.com

{1} Prepare the virtual machine to run Apache Airflow

Open a shell on your computer and log into this fresh virtual machine :

ssh ubuntu@123.123.123.123

Update and upgrade the operating system of the virtual machine :

sudo apt-get update
sudo apt-get upgrade
sudo apt autoremove

If Python 3 was already installed, that was not the case for pip, so we install it :

sudo apt install python3-pip

Install Apache :

sudo apt install apache2

Install the virtual environnement for Python :

sudo apt install python3-venv

Install PostgreSQL for the database :

sudo apt install postgresql

Connect to PostgreSQL :

sudo -u postgres psql template1

Inside PostgreSQL, create a database, create a user, grant privileges, grant again priviliges on the public schema, connect to this newly database, check the encoding (we should have UTF8), change the owner of this database, add a schema for saving data, then exit :

CREATE DATABASE airflow_db;
CREATE USER airflow_user WITH PASSWORD 'my_db_password';
GRANT ALL PRIVILEGES ON DATABASE airflow_db TO airflow_user;
GRANT ALL ON SCHEMA public TO airflow_user;
\c airflow_db
SHOW SERVER_ENCODING;
ALTER DATABASE airflow_db OWNER TO airflow_user;
CREATE SCHEMA airflow_schema;
exit

Open the postgresql configuration file :

sudo vi /etc/postgresql/16/main/pg_hba.conf

And change “peer” to “md5”, save and quit :

# "local" is for Unix domain socket connections only
local all all md5

Create a user for using Apache Airflow :

sudo adduser airflow

Close the ssh connection for ubuntu user, and open a new one with the airflow user :

exit
ssh airflow@123.123.123.123

Create a virtual environnement in the airflow home directory :

cd /home/airflow/
python3 -m venv airflow-env

Activate the virtual environment :

source airflow-env/bin/activate

Install Apache Airflow :

pip3 install "apache-airflow[celery]==2.9.2" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.9.2/constraints-3.8.txt"

Install the code editor plugin (useful to code directly inside the browser) :

pip3 install airflow-code-editor

Install the python library to connect to the database :

pip3 install psycopg2-binary

Install the package allowing Airflow to access PostgreSQL :

pip3 install apache-airflow-providers-postgres

Init Airflow :

aiflow db init

Update the configuration of Apache Airflow :

cd airflow
cp airflow.cfg airflow.cfg.genuine
vi airflow.cfg

Then, inside the file, append these changes, and quit the text editor :

#...
executor = LocalExecutor
#...
warn_deployment_exposure = False
#...
sql_alchemy_conn = postgresql+psycopg2://airflow_user:my_db_password@localhost/airflow_db
#...

Moreover, you can optionnally apply the following changes :

# ...
load_examples = False
# ...
load_default_connections = False
# ...
from_email = noreply@mydomain.com
#...
smtp_host = ssl0.ovh.net
smtp_user = noreply@mydomain.com
smtp_password = noreply_password
smtp_port = 587
smtp_email_from = noreply@jimbot.eu

Migrate the database :

airflow db migrate

Create an admin account for Apache Airflow :

airflow users create --role Admin --username your_nickname --email your@email.com --fir  stname Your_Firstname --lastname Your_Lastname --password Your_Password

Create two directories for the future DAGs and data :

cd /home/airflow/airflow
mkdir data
mkdir dags

Close the ssh connection for airflow user, and open a new one with the ubuntu user :

exit
ssh ubuntu@123.123.123.123

{2} Run Apache Airflow

Create a service to manage Apache Airflow webserver instance :

sudo vi /etc/systemd/system/airflow-webserver.service

Paste the following code inside this file, save, and quit :

[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service
Wants=postgresql.service

[Service]
EnvironmentFile=/etc/environment
User=airflow
Group=airflow
Type=simple
ExecStart=/bin/bash -c 'source /home/airflow/airflow-env/bin/activate; airflow webserver'
Restart=on-failure
RestartSec=5s
PrivateTmp=true

[Install]
WantedBy=multi-user.target

Create a service to manage Apache Airflow scheduler instance :

sudo vi /etc/systemd/system/airflow-scheduler.service

Paste the following code inside this file, save, and quit :

[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service
Wants=postgresql.service

[Service]
EnvironmentFile=/etc/environment
User=airflow
Group=airflow
Type=simple
ExecStart=/bin/bash -c 'source /home/airflow/airflow-env/bin/activate; airflow scheduler'
Restart=on-failure
RestartSec=5s
PrivateTmp=true

[Install]
WantedBy=multi-user.target

Reload the daemon :

sudo systemctl daemon-reload

Run these two services :

sudo service airflow-webserver start
sudo service airflow-scheduler start

Allow the airflow user to run the services without being a sudo user :

sudo vi /etc/sudoers.d/airflow_services

And add these lines inside the file, and quit :

Cmnd_Alias USER_SERVICES = /usr/bin/systemctl start airflow-webserver, /usr/bin/systemctl stop airflow-webserver, /usr/bin/systemctl restart airflow-webserver, /usr/bin/systemctl start airflow-scheduler, /usr/bin/systemctl stop airflow-scheduler, /usr/bin/systemctl restart airflow-scheduler

airflow ALL=(ALL) PASSWD:USER_SERVICES

Install a SSL certificate (provide the domain name bought previously) :

sudo snap install --classic certbot
sudo ln -s /snap/bin/certbot /usr/bin/certbot
sudo certbot --apache

Edit the file :

sudo vi /etc/apache2/sites-enabled/000-default-le-ssl.conf

Add the following lines before </VirtualHost> :

ProxyPreserveHost On
ProxyPass "/" "http://127.0.0.1:8080/"
ProxyPassReverse "/" "http://127.0.0.1:8080/"

Install the modules for redirection :

sudo a2enmod proxy
sudo a2enmod proxy_http
sudo a2enmod rewrite
sudo systemctl restart apache2

Quit the virtual machine :

exit

{3} Update DNS

In the DNS zone of mydomain.com (from the web interface of your domain provider), add the following entries :

| Domain           | Type | TTL | Target          |
|------------------|------|-----|-----------------|
| mydomain.com | 0 | A | 123.123.123.123 |
| www.mydomain.com | 0 | A | 123.123.123.123 |

{4} Log into Apache Airflow

Enter the following adress inside your web browser :

https://mydomain.com

{5} Miscellaneous

A useful command if a problem with Airflow :

journalctl -u airflow-webserver.service

The adventure continue on:

--

--

Jean-Michel ARNAUD

Enthusiastic about digital transformation management, with a strong focus on customer satisfaction and technical excellence.