Easy way to manage your Airflow setup

achilleus
3 min readMar 30, 2020

--

Set up Airflow as a systemd service

Airflow has support to integrate with any systemd or upstart based systems. It is my preferred way to run something automatically every time the system restarts and control its behavior. This gives us the power to manage, monitor the status of the airflow webserver and scheduler. It also allows us to automatically restart airflow daemons on failure or restart.

A quick note about systemd : It is a system and service manager for Linux operating systems. It is a System Management Daemon, the first process that starts at boot. It starts the processes that are required by the Linux OS when it boots.

This is one of the quick ways of running Airflow with systemd. The result of this would be that our Airflow webserver and web-scheduler would be started automatically when our system restarts and gives higher-level API to check the status, start, stop and restart the airflow set up like any other Linux system processes.

Please note that I am running all these commands as root user and you might want to sudo wherever required.

airflow-webserver.service

We need a create a new file at the location :
/etc/systemd/system/airflow-webserver.service

The contents of this file should be something like:

[Unit]Description=Airflow webserver daemonAfter=network.target postgresql-9.6.serviceWants=postgresql-9.6.service[Service]RuntimeDirectory=airflowRuntimeDirectoryMode=0775Type=simpleExecStart=/usr/bin/bash -c 'source /root/anaconda3/bin/activate ; airflow webserver --pid /run/airflow/webserver.pid'Restart=on-failureRestartSec=5sPrivateTmp=true[Install]WantedBy=multi-user.target

This creates an Airflow webserver service. Please note that this is just one of the ways to configure airflow webserver service. We can tweak and change things as required.

After : This specifies that the airflow webserver service needs networking to be ready before it is Started. Once Networking is ready, it also specifies that our service expects a Postgres database to be up and running before we started our airflow web service.

Wants: This conveys that we need Postgres to be started before our service tobe started but let’s say Postgres service errored out and did not start at the system start, our service won’t be affected by it.Wants create a weaker dependency than Requires .

RuntimeDirectory and RuntimeDirectoryMode needed to specify what directory should this run on. I had to add it to overcome theError: /run/airflow doesn’t exist. Can’t create pidfile.

ExecStart : This specifies what command needs to be run with arguments when this service is started. This is what is responsible to start the airflow webserver process.

Restart : This is to convey that our service should be restarted when the airflow webserver service is killed or restarted with an exception you are manually trying to stop this service using a systemctl stop command.

PrivateTmp : This sets up a file system namespace for the process that is being executed which is not shared by processes outside of the namespace. It mounts private /tmp and /var/tmp directories.

WantedBy : It will load as a part of the standard multi-user boot process and the service will be started when the listed After services are started.

To create and start airflow-webserver service:

vi /etc/systemd/system/airflow-webserver.servicechmod 664 /etc/systemd/system/airflow-webserver.servicesystemctl enable airflow-webserver.servicesystemctl start airflow-webserver

You can check the status of the service using:

systemctl status airflow-webserver

Every time you make any changes airflow-webserver.service file, for them to take effect,

systemctl daemon-reload

You can tail the logs of the airflow-webserver by using:

journalctl -u airflow-webserver.service -f

Similarly, we can create the airflow-scheduler.service

vi /etc/systemd/system/airflow-scheduler.servicechmod 664 /etc/systemd/system/airflow-scheduler.service

Add this:

[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql-9.6.service
Wants=postgresql-9.6.service
[Service]
RuntimeDirectory=airflow
RuntimeDirectoryMode=0775
Type=simple
ExecStart=/usr/bin/bash -c 'source /root/anaconda3/bin/activate ; airflow scheduler'
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target

And to start and enable them:

systemctl enable airflow-scheduler.service
systemctl start airflow-scheduler
systemctl status airflow-scheduler
journalctl -u airflow-scheduler

You can read more about systemd services here.

Thanks for reading! I would love to hear your thoughts or comments. Please do share the article, if you liked it. Check out my other articles here.

--

--