Setting up Apache Airflow with Celery Executor on Your Linux Machine

Yuvraj singh
2 min readFeb 1, 2024

Apache Airflow stands out as a powerful open-source platform for orchestrating workflows. In this comprehensive guide, we’ll walk through the process of setting up Apache Airflow with the Celery Executor on your Linux machine, transforming it into a distributed environment comprising a master node and two worker nodes. This configuration enables parallel task execution, ensuring scalability and efficient resource utilization on your Linux setup.

Step 1: Update System Packages

Start by ensuring that all system packages on your Linux machine are up-to-date. This is a crucial initial step to prevent potential compatibility issues during the installation process.

sudo apt update && sudo apt upgrade -y

Step 2: Install Dependencies and Create Virtual Environment

Install necessary dependencies and create a virtual environment:

sudo apt install build-essential python3-dev libsqlite3-dev openssl sqlite default-libmysqlclient-dev libmysqlclient-dev
sudo apt install python3.8-venv

python3 -m venv ve
source ve/bin/activate

Step 3: Install Apache Airflow

Install Apache Airflow within the virtual environment:

pip install 'apache-airflow[celery]==2.6.3' --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.6.3/constraints-3.8.txt"
pip install mysqlclient
airflow version

Step 4: Set Up DAGs Directory

Create a directory for DAGs:

mkdir /root/airflow/dags
cd /root/airflow/dags

Step 5: Mount DAGs & Logs Directory to a DFS

Mount the DAGs and logs directory to a distributed file system.

Step 6: Edit Airflow Configuration

Edit the airflow.cfg configuration file with the following details:

executor = CeleryExecutor
sql_alchemy_conn = mysql://*****:**@******:3306/airflow_demo
broker_url = amqp://****:****@*****:5672/airflowhost
dag_dir_list_interval = 30

Step 7: Initialize Airflow

Initialize Airflow after configuration:

airflow db init

Step 8: Repeat Steps 1–7 on Worker Nodes

Repeat steps 1 to 7 on worker nodes. Replace the airflow.cfg file from the workers with the master's airflow.cfg.

Start the Airflow webserver and scheduler on the master:

airflow webserver -p 8080
airflow scheduler

Step 10: Start Airflow Workers on Worker Nodes

Start Airflow workers on both worker nodes:

airflow celery worker

Conclusion

Your Apache Airflow setup with Celery Executor on a distributed environment is now complete. You can access the Airflow UI on the master node at http://<master-node-ip>:8080 and start managing and monitoring your workflows.

--

--

Yuvraj singh

Data Engineer | Airflow | Airbyte | DBT | Tableau | SQL | Postgres | Python Developer | BIGQUERY