Airflow Trigger Rules: A Comprehensive Guide
Orchestrating complex workflows efficiently is key to successful data pipeline management.
One critical aspect of this orchestration is defining how tasks within a Directed Acyclic Graph (DAG) interact with each other.
This is where trigger rules come into play.
Trigger rules determine when a task should execute within the context of its dependencies.
They define the conditions under which a task can proceed with its execution. Airflow offers various trigger rules, each serving different purposes and offering unique control over workflow behavior.
# Import necessary libraries
import airflow
from airflow.models import DAG
from airflow.operators.python import PythonOperator
# Define default arguments
default_args = {
'owner': 'Airflow',
'start_date': airflow.utils.dates.days_ago(1),
}
# Define Python functions representing tasks
def download_website_a():
print("Downloading website A")
raise ValueError("error")
def download_website_b():
print("Downloading website B")
raise ValueError("error")
def download_failed():
print("Download failed")
# raise ValueError("error")
def download_succeed():
print("Download succeeded")
# raise ValueError("error")
def process():
print("Processing data")
# raise ValueError("error")
def notif_a():
print("Notification A")
# raise ValueError("error")
def notif_b():
print("Notification B")
# raise ValueError("error")
# Define the DAG
with DAG(dag_id='trigger_rule_dag',
default_args=default_args,
schedule_interval="@daily") as dag:
# Define tasks with corresponding trigger rules
download_website_a_task = PythonOperator(
task_id='download_website_a',
python_callable=download_website_a,
trigger_rule="all_success"
)
download_website_b_task = PythonOperator(
task_id='download_website_b',
python_callable=download_website_b,
trigger_rule="all_success"
)
download_failed_task = PythonOperator(
task_id='download_failed',
python_callable=download_failed,
trigger_rule="all_failed"
)
download_succeed_task = PythonOperator(
task_id='download_succeed',
python_callable=download_succeed,
trigger_rule="all_success"
)
process_task = PythonOperator(
task_id='process',
python_callable=process,
trigger_rule="one_success"
)
notif_a_task = PythonOperator(
task_id='notif_a',
python_callable=notif_a,
trigger_rule="none_failed"
)
notif_b_task = PythonOperator(
task_id='notif_b',
python_callable=notif_b,
trigger_rule="one_failed"
)
# Define task dependencies
[download_website_a_task, download_website_b_task] >> download_succeed_task
[download_website_a_task, download_website_b_task] >> download_failed_task
[download_failed_task, download_succeed_task] >> process_task >> [notif_a_task, notif_b_task]
Trigger Rules
Now, let’s dissect the trigger rules applied to each task and understand their implications:
all_success
: This rule dictates that the task should execute only if all of its directly upstream tasks have succeeded. In our example, bothdownload_website_a_task
anddownload_website_b_task
must succeed fordownload_succeed_task
to execute.all_failed
: This rule specifies that the task should execute only if all of its directly upstream tasks have failed. In our scenario,download_failed_task
will run only if bothdownload_website_a_task
anddownload_website_b_task
fail.one_success
: This rule indicates that the task should execute as soon as at least one of its directly upstream tasks succeeds. Here,process_task
will run as soon as eitherdownload_failed_task
ordownload_succeed_task
completes successfully.none_failed
: This rule ensures that the task executes only if none of its directly upstream tasks have failed. In our example,notif_a_task
will execute if neitherdownload_website_a_task
nordownload_website_b_task
fails.one_failed
: This rule mandates that the task should execute as soon as at least one of its directly upstream tasks fails. Thus,notif_b_task
will run if eitherdownload_website_a_task
ordownload_website_b_task
fails.none_skipped
: This rule stipulates that the task should execute only if none of its directly upstream tasks are skipped. However, this rule is not explicitly demonstrated in our example DAG.
Trigger rules play a crucial role in governing task execution within Airflow DAGs, providing flexibility and control over workflow behavior.
By strategically applying trigger rules, you can design robust and efficient data pipelines tailored to your specific requirements.