Celery Executor in Apache-Airflow

(A Guide to Setup & Running of Tasks using Celery Executor)

Mukesh Kumar
Accredian
5 min readSep 2, 2022

--

Written in collaboration with Hiren Rupchandani

Preface

In the previous story, you learned how to set up Celery Executor and its associated components. Now, it’s time to develop some tasks that can run in parallel using the Celery Executor.

Please note that using celery executor will enable us to make production-ready data pipelines. We can run several tasks in parallel with the help of celery workers across different servers.

You will learn a simple demonstration of celery’s working only on the local server, which you can extend to multiple remote servers in real life. Let’s go.

Creating a Python File

  • Create a new python file inside the airflow/dags directory on your system as “celery_executor_demo.py” and open the file in your favorite editor.

Importing the Necessary Modules

  • We will import the “DAG” module and the “operators.python” module from the airflow package.
  • We will also import the “datetime” and “time ”modules to help us schedule and monitor the execution of the dags.

Creating a DAG object

  • We will now initialize a DAG object as follows:

Creating Some Tasks

  • We will create fourteen tasks inside our DAG, “start ”and “end ”tasks indicate the start and end of the dag.
  • The other tasks are task2_1, task2_2, task2_3, task3_1, task3_2, task3_3, task3_4, task3_5, task3_6, task4_1, task4_2, and task5.
  • These 12 tasks call a “sleeping_function ”and sleep for 5 seconds.
  • The task definitions will look like the following:
  • We have attached the link to the code file below for your reference.

Creating callable functions

  • task1 calls a function named hello_function that will print some text and sleep for 5 seconds using the time.sleep() function.
  • task6 calls a function named bye_function that will print some pre-defined text and will indicate the end of the DAGRun.
  • The rest of the tasks will call a function named sleeping_function that will sleep for 5 seconds using the time.sleep() function.
  • The callable functions will look like this:

Setting Dependencies between Tasks

  • The dependencies will be set such that task2_1, task2_2, and task2_3 will run in parallel after task1 has finished its execution.
  • Tasks task3_1 and task3_2 will run only after task2_1 has finished its execution.
  • Tasks task3_3 and task3_4 will run only after task2_2 has finished its execution.
  • Tasks task3_5 and task3_6 will run only after task2_3 has finished its execution.
  • Tasks task4_1 will run only after task3_1, task3_2, and task3_3 have successfully finished their execution.
  • Tasks task4_2 will run only after task3_4, task3_5, and task3_6 have successfully finished their execution.
  • Tasks task5 will run only after task4_1 and task4_2 have successfully finished their execution.
  • Finally, task6 will end the DAGRun after it gets triggered by the execution of task5.
  • The dependencies will look like this:

Code Reference

You can download and refer to the configuration file and the code file:

The Configuration File

The Code File

Execution of the DAG

Airflow Webserver
  • Activate the “celery_executor_demo” DAG, trigger it and go to the graph view. The DAGRun should look like this.
Visualization of DAG Execution
  • You can also view the Gantt chart of the execution by selecting the Gantt option. The Gantt chart should look like this:
Gantt Chart for the DAG
  • We can observe the parallel execution of various tasks.
  • Congratulations! We have implemented parallel execution in Airflow using the local executor.

Final Thoughts and Closing Comments

There are some vital points many people fail to understand while they pursue their Data Science or AI journey. If you are one of them and looking for a way to counterbalance these cons, check out the certification programs provided by INSAID on their website. If you liked this story, I recommend you to go with the Global Certificate in Data Science because this one will cover your foundations plus machine learning algorithms (basic to advance).

& That’s it. I hope you liked the explanation of Celery Executor in Apache-Airflow and learned something valuable. Please let me know in the comment section if you have anything to share with me. I would love to know your thoughts.

What’s next?

  • Kubernetes Executor in Airflow

--

--

Mukesh Kumar
Accredian

Data Scientist, having a robust math background, skilled in predictive modeling, data processing, and mining strategies to solve challenging business problems.