Celery Executor in Apache-Airflow

(A Guide to Setup & Running of Tasks using Celery Executor)

Mukesh Kumar
Accredian
4 min readSep 2, 2022

--

Preface

In the previous story, you learned how to set up Celery Executor and its associated components. Now, it’s time to develop some tasks that can run in parallel using the Celery Executor.

Please note that using celery executor will enable us to make production-ready data pipelines. We can run several tasks in parallel with the help of celery workers across different servers.

You will learn a simple demonstration of celery’s working only on the local server, which you can extend to multiple remote servers in real life. Let’s go.

Creating a Python File

  • Create a new python file inside the airflow/dags directory on your system as “celery_executor_demo.py” and open the file in your favorite editor.

Importing the Necessary Modules

  • We will import the “DAG” module and the “operators.python” module from the airflow package.
  • We will also import the “datetime” and “time ”modules to help us schedule and monitor the execution of the dags.

Creating a DAG object

  • We will now initialize a DAG object as follows:

Creating Some Tasks

  • We will create fourteen tasks inside our DAG, “start ”and “end ”tasks indicate the start and end of the dag.
  • The other tasks are task2_1, task2_2, task2_3, task3_1, task3_2, task3_3, task3_4, task3_5, task3_6, task4_1, task4_2, and task5.
  • These 12 tasks call a “sleeping_function ”and sleep for 5 seconds.
  • The task definitions will look like the following:
  • We have attached the link to the code file below for your reference.

Creating callable functions

  • task1 calls a function named hello_function that will print some text and sleep for 5 seconds using the time.sleep() function.
  • task6 calls a function named bye_function that will print some pre-defined text and will indicate the end of the DAGRun.
  • The rest of the tasks will call a function named sleeping_function that will sleep for 5 seconds using the time.sleep() function.
  • The callable functions will look like this:

Setting Dependencies between Tasks

  • The dependencies will be set such that task2_1, task2_2, and task2_3 will run in parallel after task1 has finished its execution.
  • Tasks task3_1 and task3_2 will run only after task2_1 has finished its execution.
  • Tasks task3_3 and task3_4 will run only after task2_2 has finished its execution.
  • Tasks task3_5 and task3_6 will run only after task2_3 has finished its execution.
  • Tasks task4_1 will run only after task3_1, task3_2, and task3_3 have successfully finished their execution.
  • Tasks task4_2 will run only after task3_4, task3_5, and task3_6 have successfully finished their execution.
  • Tasks task5 will run only after task4_1 and task4_2 have successfully finished their execution.
  • Finally, task6 will end the DAGRun after it gets triggered by the execution of task5.
  • The dependencies will look like this:

Code Reference

You can download and refer to the configuration file and the code file:

The Configuration File

The Code File

Execution of the DAG

Airflow Webserver
  • Activate the “celery_executor_demo” DAG, trigger it and go to the graph view. The DAGRun should look like this.
Visualization of DAG Execution
  • You can also view the Gantt chart of the execution by selecting the Gantt option. The Gantt chart should look like this:
Gantt Chart for the DAG
  • We can observe the parallel execution of various tasks.
  • Congratulations! We have implemented parallel execution in Airflow using the local executor.

Final Thoughts and Closing Comments

There are some vital points many people fail to understand while they pursue their journey in Computer Science, Data Science and AI. If you are one of them and looking for a way to counterbalance these cons, then Follow me and Subscribe for more forthcoming articles related to Python, Computer Science, Data Science, Machine Learning, and Artificial Intelligence.

If you find this read helpful, then hit the Clap👏. Your encouragement will catalyze inspiration to keep me going and develop more cool stuff like this.

What’s next?

  • Kubernetes Executor in Airflow

--

--

Mukesh Kumar
Accredian

Data Scientist, having a robust math background, skilled in predictive modeling, data processing, and mining strategies to solve challenging business problems.