Discover Apache Airflow by setting up a working environment on Ubuntu with PostgreSQL and SSL certificate
In this story, we will start with a fresh Ubuntu server and, step by step, get an Apache Airflow instance up and running. Like a recipe for a cake, we will use a series of elementary actions and try to avoid nothing to get a good cake in the end.
{0} Ingredients
First, we need a virtual machine. We bought one on OVH cloud with Ubuntu 24.04 on it, after our purchase, we receive an IP (hereafter 123.123.123.123), an account (ubuntu), and a password (password).
Secondly, buy a domain name (same provider for us, OVH). Let’s say : mydomain.com
{1} Prepare the virtual machine to run Apache Airflow
Open a shell on your computer and log into this fresh virtual machine :
ssh ubuntu@123.123.123.123
Update and upgrade the operating system of the virtual machine :
sudo apt-get update
sudo apt-get upgrade
sudo apt autoremove
If Python 3 was already installed, that was not the case for pip, so we install it :
sudo apt install python3-pip
Install Apache :
sudo apt install apache2
Install the virtual environnement for Python :
sudo apt install python3-venv
Install PostgreSQL for the database :
sudo apt install postgresql
Connect to PostgreSQL :
sudo -u postgres psql template1
Inside PostgreSQL, create a database, create a user, grant privileges, grant again priviliges on the public schema, connect to this newly database, check the encoding (we should have UTF8), change the owner of this database, add a schema for saving data, then exit :
CREATE DATABASE airflow_db;
CREATE USER airflow_user WITH PASSWORD 'my_db_password';
GRANT ALL PRIVILEGES ON DATABASE airflow_db TO airflow_user;
GRANT ALL ON SCHEMA public TO airflow_user;
\c airflow_db
SHOW SERVER_ENCODING;
ALTER DATABASE airflow_db OWNER TO airflow_user;
CREATE SCHEMA airflow_schema;
exit
Open the postgresql configuration file :
sudo vi /etc/postgresql/16/main/pg_hba.conf
And change “peer” to “md5”, save and quit :
# "local" is for Unix domain socket connections only
local all all md5
Create a user for using Apache Airflow :
sudo adduser airflow
Close the ssh connection for ubuntu user, and open a new one with the airflow user :
exit
ssh airflow@123.123.123.123
Create a virtual environnement in the airflow home directory :
cd /home/airflow/
python3 -m venv airflow-env
Activate the virtual environment :
source airflow-env/bin/activate
Install Apache Airflow :
pip3 install "apache-airflow[celery]==2.9.2" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.9.2/constraints-3.8.txt"
Install the code editor plugin (useful to code directly inside the browser) :
pip3 install airflow-code-editor
Install the python library to connect to the database :
pip3 install psycopg2-binary
Install the package allowing Airflow to access PostgreSQL :
pip3 install apache-airflow-providers-postgres
Init Airflow :
aiflow db init
Update the configuration of Apache Airflow :
cd airflow
cp airflow.cfg airflow.cfg.genuine
vi airflow.cfg
Then, inside the file, append these changes, and quit the text editor :
#...
executor = LocalExecutor
#...
warn_deployment_exposure = False
#...
sql_alchemy_conn = postgresql+psycopg2://airflow_user:my_db_password@localhost/airflow_db
#...
Moreover, you can optionnally apply the following changes :
# ...
load_examples = False
# ...
load_default_connections = False
# ...
from_email = noreply@mydomain.com
#...
smtp_host = ssl0.ovh.net
smtp_user = noreply@mydomain.com
smtp_password = noreply_password
smtp_port = 587
smtp_email_from = noreply@jimbot.eu
Migrate the database :
airflow db migrate
Create an admin account for Apache Airflow :
airflow users create --role Admin --username your_nickname --email your@email.com --fir stname Your_Firstname --lastname Your_Lastname --password Your_Password
Create two directories for the future DAGs and data :
cd /home/airflow/airflow
mkdir data
mkdir dags
Close the ssh connection for airflow user, and open a new one with the ubuntu user :
exit
ssh ubuntu@123.123.123.123
{2} Run Apache Airflow
Create a service to manage Apache Airflow webserver instance :
sudo vi /etc/systemd/system/airflow-webserver.service
Paste the following code inside this file, save, and quit :
[Unit]
Description=Airflow webserver daemon
After=network.target postgresql.service
Wants=postgresql.service
[Service]
EnvironmentFile=/etc/environment
User=airflow
Group=airflow
Type=simple
ExecStart=/bin/bash -c 'source /home/airflow/airflow-env/bin/activate; airflow webserver'
Restart=on-failure
RestartSec=5s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
Create a service to manage Apache Airflow scheduler instance :
sudo vi /etc/systemd/system/airflow-scheduler.service
Paste the following code inside this file, save, and quit :
[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service
Wants=postgresql.service
[Service]
EnvironmentFile=/etc/environment
User=airflow
Group=airflow
Type=simple
ExecStart=/bin/bash -c 'source /home/airflow/airflow-env/bin/activate; airflow scheduler'
Restart=on-failure
RestartSec=5s
PrivateTmp=true
[Install]
WantedBy=multi-user.target
Reload the daemon :
sudo systemctl daemon-reload
Run these two services :
sudo service airflow-webserver start
sudo service airflow-scheduler start
Allow the airflow user to run the services without being a sudo user :
sudo vi /etc/sudoers.d/airflow_services
And add these lines inside the file, and quit :
Cmnd_Alias USER_SERVICES = /usr/bin/systemctl start airflow-webserver, /usr/bin/systemctl stop airflow-webserver, /usr/bin/systemctl restart airflow-webserver, /usr/bin/systemctl start airflow-scheduler, /usr/bin/systemctl stop airflow-scheduler, /usr/bin/systemctl restart airflow-scheduler
airflow ALL=(ALL) PASSWD:USER_SERVICES
Install a SSL certificate (provide the domain name bought previously) :
sudo snap install --classic certbot
sudo ln -s /snap/bin/certbot /usr/bin/certbot
sudo certbot --apache
Edit the file :
sudo vi /etc/apache2/sites-enabled/000-default-le-ssl.conf
Add the following lines before </VirtualHost> :
ProxyPreserveHost On
ProxyPass "/" "http://127.0.0.1:8080/"
ProxyPassReverse "/" "http://127.0.0.1:8080/"
Install the modules for redirection :
sudo a2enmod proxy
sudo a2enmod proxy_http
sudo a2enmod rewrite
sudo systemctl restart apache2
Quit the virtual machine :
exit
{3} Update DNS
In the DNS zone of mydomain.com (from the web interface of your domain provider), add the following entries :
| Domain | Type | TTL | Target |
|------------------|------|-----|-----------------|
| mydomain.com | 0 | A | 123.123.123.123 |
| www.mydomain.com | 0 | A | 123.123.123.123 |
{4} Log into Apache Airflow
Enter the following adress inside your web browser :
https://mydomain.com
{5} Miscellaneous
A useful command if a problem with Airflow :
journalctl -u airflow-webserver.service
The adventure continue on: