Local Executor in Apache-Airflow

(A Guide to Setup & Running of Tasks using Local Executor)

Mukesh Kumar
Accredian
5 min readAug 19, 2022

--

Preface

Previously, you observed the summary of some of the inbuilt executors that Apache-Airflow offers. You also come across that the local executor enables users to run multiple task instances. You discovered that the local executor uses either MySQL or PostgreSQL databases since they allow multiple connections, which helps us to achieve parallelism.

If you are interested in more details, you can go through the limited and unlimited parallelism LocalExecutor can offer by using the airflow documentation. Please note that we are not covering the sequential executor because it is the default executor of Apache-Airflow, which runs tasks one by one and doesn’t offer parallelism.

In the rest of the story, you will discover the setup of the Local Executor and its execution mechanism. So, let’s get started.

Setup for Local Executor

  • Before we start with the implementation and execution of our dags using the local executor, we will need to set up MySQL (which works for WSL2 and Ubuntu) on our system and configure our airflow application with it.
  • You can refer to the airflow documentation for configuring MySQL with airflow.
  • After setting up the MySQL database, you need to navigate to airflow.cfg file in the airflow directory, search for executor variable, and change the value from SequentialExecutor to LocalExecutor.
  • Save the file and run the airflow webserver and scheduler to check if the settings are working correctly with the LocalExecutor. If yes, we can proceed with the DAG creation. If not, let me know in the comments with details, and I will try to help you in the best possible manner.

Creating a Python File

  • Similar to the previous article, we will create a DAG file consisting of multiple tasks and set the dependencies between tasks that require parallel execution.
  • Create a new python file inside the dags directory inside the airflow directory on your system as “local_executor_demo.py” and open the file in your favorite editor.
  • We highly recommend opening the entire airflow directory on your favorite editor because it will be easy to navigate inside the software.

Importing the Necessary Modules

  • We will import the “DAGmodule and the “operators.pythonmodule from the airflow package.
  • In addition, we will import the “datetimemodule to help us schedule the dags.
  • Finally, we will import the time module to observe the execution of tasks in the DAG.

Creating a DAG object

  • Next, we will now initialize a DAG object as follows:

Creating Some Tasks

  • We will create four tasks inside our DAG, abbreviated as task1, task2_1, task2_2, and task3.
  • task1: Calls a “hello_function” and sleeps for 5 seconds.
  • task2_1: Calls a “sleeping_function” and sleeps for 5 seconds.
  • task2_2: Calls a “sleeping_function” and sleeps for 5 seconds.
  • task3: Calls a “bye_function”.
  • The task definitions will look like the following:

Creating callable functions

  • task1 calls a function named hello_function that will print some text and sleep for 5 seconds using the time.sleep() function.
  • task2_1 and task2_2 will call a function named sleeping_function that will sleep for 5 seconds using the time.sleep() function.
  • task3 calls a function named bye_function that will print some pre-defined text and will indicate the end of the DAGRun.
  • The callable functions will look like this:

Setting Dependencies between Tasks

  • The dependencies will be set such that task2_1 and task2_2 will execute together once task1 completes its execution. Task task3 will run only after both task2_1 and task2_2 finish their execution.
  • The dependencies will look something as shown below in our case:

Voila, it’s a DAG file

After compiling all the elements of the DAG, our final code should look like this:

A DAG File

Code Reference

You can download and refer to the configuration file and the code file:

The Configuration File

The Code File

Execution of the DAG

  • To see the file running, activate the virtual environment and start your airflow webserver and scheduler.
  • Go to http://localhost:8080/home (or your dedicated port for airflow), and you should see the following on the web server UI:
Webserver Home Screen
  • Activate the “local_executor_demo” DAG, trigger it and go to the graph view. The DAGRun should look like this.
The DAG Run
  • You can also view the Gantt chart of the execution by selecting the Gantt option. The Gantt chart should look like this:
Gantt Chart
  • We can see that sleepy_1 (task2_1) and sleepy_2 (task2_2) got executed in parallel in the DAG.
  • Congratulations! We have implemented parallel execution in Apache-Airflow using the local executor.

& That’s it. I hope you liked the explanation of Local Executor in Apache-Airflow and learned something valuable. Please let me know in the comment section if you have anything to share with me. I would love to know your thoughts.

Final Thoughts and Closing Comments

There are some vital points many people fail to understand while they pursue their journey in Computer Science, Data Science and AI. If you are one of them and looking for a way to counterbalance these cons, then Follow me and Subscribe for more forthcoming articles related to Python, Computer Science, Data Science, Machine Learning, and Artificial Intelligence.

If you find this read helpful, then hit the Clap👏. Your encouragement will catalyze inspiration to keep me going and develop more cool stuff like this.

What’s next?

--

--

Mukesh Kumar
Accredian

Data Scientist, having a robust math background, skilled in predictive modeling, data processing, and mining strategies to solve challenging business problems.