Local Executor in Apache-Airflow
(A Guide to Setup & Running of Tasks using Local Executor)
Preface
Previously, you observed the summary of some of the inbuilt executors that Apache-Airflow offers. You also come across that the local executor enables users to run multiple task instances. You discovered that the local executor uses either MySQL or PostgreSQL databases since they allow multiple connections, which helps us to achieve parallelism.
If you are interested in more details, you can go through the limited and unlimited parallelism LocalExecutor can offer by using the airflow documentation. Please note that we are not covering the sequential executor because it is the default executor of Apache-Airflow, which runs tasks one by one and doesn’t offer parallelism.
In the rest of the story, you will discover the setup of the Local Executor and its execution mechanism. So, let’s get started.
Setup for Local Executor
- Before we start with the implementation and execution of our dags using the local executor, we will need to set up MySQL (which works for WSL2 and Ubuntu) on our system and configure our airflow application with it.
- You can refer to the airflow documentation for configuring MySQL with airflow.
- After setting up the MySQL database, you need to navigate to airflow.cfg file in the airflow directory, search for
executor
variable, and change the value fromSequentialExecutor
toLocalExecutor.
- Save the file and run the airflow webserver and scheduler to check if the settings are working correctly with the LocalExecutor. If yes, we can proceed with the DAG creation. If not, let me know in the comments with details, and I will try to help you in the best possible manner.
Creating a Python File
- Similar to the previous article, we will create a DAG file consisting of multiple tasks and set the dependencies between tasks that require parallel execution.
- Create a new python file inside the dags directory inside the airflow directory on your system as “local_executor_demo.py” and open the file in your favorite editor.
- We highly recommend opening the entire airflow directory on your favorite editor because it will be easy to navigate inside the software.
Importing the Necessary Modules
- We will import the “DAG” module and the “operators.python” module from the airflow package.
- In addition, we will import the “datetime” module to help us schedule the dags.
- Finally, we will import the time module to observe the execution of tasks in the DAG.
from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta
import time
Creating a DAG object
- Next, we will now initialize a DAG object as follows:
with DAG(
dag_id="local_executor_demo",
start_date=datetime(2021,1,1),
schedule_interval="@hourly",
catchup=False) as dag:
Creating Some Tasks
- We will create four tasks inside our DAG, abbreviated as task1, task2_1, task2_2, and task3.
- task1: Calls a “hello_function” and sleeps for 5 seconds.
- task2_1: Calls a “sleeping_function” and sleeps for 5 seconds.
- task2_2: Calls a “sleeping_function” and sleeps for 5 seconds.
- task3: Calls a “bye_function”.
- The task definitions will look like the following:
task1=PythonOperator(
task_id="hello_function",
python_callable=hello_function
)task2_1=PythonOperator(
task_id="sleepy_1",
python_callable=sleeping_function
)task2_2=PythonOperator(
task_id="sleepy_2",
python_callable=sleeping_function
)task3=PythonOperator(
task_id="bye_function",
python_callable=last_function
)
Creating callable functions
- task1 calls a function named hello_function that will print some text and sleep for 5 seconds using the time.sleep() function.
- task2_1 and task2_2 will call a function named sleeping_function that will sleep for 5 seconds using the time.sleep() function.
- task3 calls a function named bye_function that will print some pre-defined text and will indicate the end of the DAGRun.
- The callable functions will look like this:
def hello_function():
print('Hello, this is the first task of the DAG')
time.sleep(5)def last_function():
print('DAG run is done.')def sleeping_function():
print("Sleeping for 5 seconds")
time.sleep(5)
Setting Dependencies between Tasks
- The dependencies will be set such that task2_1 and task2_2 will execute together once task1 completes its execution. Task task3 will run only after both task2_1 and task2_2 finish their execution.
- The dependencies will look something as shown below in our case:
task1>>[task2_1,task2_2]>>task3
Voila, it’s a DAG file
After compiling all the elements of the DAG, our final code should look like this:
Code Reference
You can download and refer to the configuration file and the code file:
The Configuration File
The Code File
Execution of the DAG
- To see the file running, activate the virtual environment and start your airflow webserver and scheduler.
- Go to http://localhost:8080/home (or your dedicated port for airflow), and you should see the following on the web server UI:
- Activate the “local_executor_demo” DAG, trigger it and go to the graph view. The DAGRun should look like this.
- You can also view the Gantt chart of the execution by selecting the Gantt option. The Gantt chart should look like this:
- We can see that sleepy_1 (task2_1) and sleepy_2 (task2_2) got executed in parallel in the DAG.
- Congratulations! We have implemented parallel execution in Apache-Airflow using the local executor.
& That’s it. I hope you liked the explanation of Local Executor in Apache-Airflow and learned something valuable. Please let me know in the comment section if you have anything to share with me. I would love to know your thoughts.
Final Thoughts and Closing Comments
There are some vital points many people fail to understand while they pursue their journey in Computer Science, Data Science and AI. If you are one of them and looking for a way to counterbalance these cons, then Follow me and Subscribe for more forthcoming articles related to Python, Computer Science, Data Science, Machine Learning, and Artificial Intelligence.
If you find this read helpful, then hit the Clap👏. Your encouragement will catalyze inspiration to keep me going and develop more cool stuff like this.