Local Executor in Apache-Airflow

(A Guide to Setup & Running of Tasks using Local Executor)

Published in

Accredian

5 min readAug 19, 2022

Preface

Previously, you observed the summary of some of the inbuilt executors that Apache-Airflow offers. You also come across that the local executor enables users to run multiple task instances. You discovered that the local executor uses either MySQL or PostgreSQL databases since they allow multiple connections, which helps us to achieve parallelism.

If you are interested in more details, you can go through the limited and unlimited parallelism LocalExecutor can offer by using the airflow documentation. Please note that we are not covering the sequential executor because it is the default executor of Apache-Airflow, which runs tasks one by one and doesn’t offer parallelism.

In the rest of the story, you will discover the setup of the Local Executor and its execution mechanism. So, let’s get started.

Setup for Local Executor

Before we start with the implementation and execution of our dags using the local executor, we will need to set up MySQL (which works for WSL2 and Ubuntu) on our system and configure our airflow application with it.
You can refer to the airflow documentation for configuring MySQL with airflow.
After setting up the MySQL database, you need to navigate to airflow.cfg file in the airflow directory, search for executor variable, and change the value from SequentialExecutor to LocalExecutor.
Save the file and run the airflow webserver and scheduler to check if the settings are working correctly with the LocalExecutor. If yes, we can proceed with the DAG creation. If not, let me know in the comments with details, and I will try to help you in the best possible manner.

Creating a Python File

Similar to the previous article, we will create a DAG file consisting of multiple tasks and set the dependencies between tasks that require parallel execution.
Create a new python file inside the dags directory inside the airflow directory on your system as “local_executor_demo.py” and open the file in your favorite editor.
We highly recommend opening the entire airflow directory on your favorite editor because it will be easy to navigate inside the software.

Importing the Necessary Modules

We will import the “DAG” module and the “operators.python” module from the airflow package.
In addition, we will import the “datetime” module to help us schedule the dags.
Finally, we will import the time module to observe the execution of tasks in the DAG.

from airflow import DAG
from airflow.operators.python import PythonOperator
from datetime import datetime, timedelta
import time

Creating a DAG object

Next, we will now initialize a DAG object as follows:

with DAG(
  dag_id="local_executor_demo",
  start_date=datetime(2021,1,1),
  schedule_interval="@hourly",
  catchup=False) as dag:

Creating Some Tasks

We will create four tasks inside our DAG, abbreviated as task1, task2_1, task2_2, and task3.
task1: Calls a “hello_function” and sleeps for 5 seconds.
task2_1: Calls a “sleeping_function” and sleeps for 5 seconds.
task2_2: Calls a “sleeping_function” and sleeps for 5 seconds.
task3: Calls a “bye_function”.
The task definitions will look like the following:

task1=PythonOperator(
  task_id="hello_function",
  python_callable=hello_function
  )task2_1=PythonOperator(
  task_id="sleepy_1",
  python_callable=sleeping_function
  )task2_2=PythonOperator(
  task_id="sleepy_2",
  python_callable=sleeping_function
  )task3=PythonOperator(
  task_id="bye_function",
  python_callable=last_function
  )

Creating callable functions

task1 calls a function named hello_function that will print some text and sleep for 5 seconds using the time.sleep() function.
task2_1 and task2_2 will call a function named sleeping_function that will sleep for 5 seconds using the time.sleep() function.
task3 calls a function named bye_function that will print some pre-defined text and will indicate the end of the DAGRun.
The callable functions will look like this:

def hello_function():
 print('Hello, this is the first task of the DAG')
 time.sleep(5)def last_function():
 print('DAG run is done.')def sleeping_function():
 print("Sleeping for 5 seconds")
 time.sleep(5)

Setting Dependencies between Tasks

The dependencies will be set such that task2_1 and task2_2 will execute together once task1 completes its execution. Task task3 will run only after both task2_1 and task2_2 finish their execution.
The dependencies will look something as shown below in our case:

task1>>[task2_1,task2_2]>>task3

Voila, it’s a DAG file

After compiling all the elements of the DAG, our final code should look like this:

Code Reference

You can download and refer to the configuration file and the code file:

The Configuration File

airflow.cfg

The configuration file for airflow application.

drive.google.com

The Code File

local_executor_demo.py

A demo python file containing design and implementation of tasks.