Apache Airflow(四) start_date & schedule_interval
Apache Airflow 2.1.2版本紀錄 更新時間: 2022/06/26
排程, 開始時間與間隔時間筆記
default_args
start_date: Date at which tasks start being scheduled
schedule_interval: Interval of time from the min(start_date) at which DAG is triggered
Airflow Default Interval
Example1:
start_date = 2021–01–01 10:00:00
dagrun = start_date + schedule_interval
•schedule_interval = “@month”
first execution = 2021–01–02 00:00:00, wait 14 hr
•schedule_interval = timedelta(days=1)
current start date = 2021–01–02 10:00:00
first execution = 2021–01–01 10:00:00, 2021–01–02 10:00:00
Example2:
start_date = datetime(2019, 10, 13, 15, 50), schedule_interval = 0 * * * * or (@hourly)
Case a) current_time is before start_date — 2019–10–13 00:00, then your dags will schedule at 2019–10–13 16:50, and subsequently every hour.
正確執行時間: execution_date + schedule_interval
Case b) current_time is after start_date — 2019–10–14 00:00, then your dags will schedule at 2019–10–13 16:50, 2019–10–13 17:50, 2019–10–13 18:50 … and subsequently catchup till it reaches 2019–10–13 23:50
Then it will wait for the strike of 2019–10–14 00:50 for the next run.
Please not that the catchup can be avoided by setting catchup=False in dag properties.
補充:
若是想要當天就執行, 建議可以把import airflow
內建的 from airflow.utils.dates import days_ago
將他加入在 start_date: days_ago(1), 把開始時間設為前一天。
Reference
•https://forum.astronomer.io/t/airflow-start-date-concepts/393
•https://airflow.apache.org/docs/apache-airflow/stable/concepts/scheduler.html