Apache Airflow(四) start_date & schedule_interval

Apache Airflow 2.1.2版本紀錄 更新時間: 2022/06/26

排程, 開始時間與間隔時間筆記

default_args

start_date: Date at which tasks start being scheduled
schedule_interval: Interval of time from the min(start_date) at which DAG is triggered

Airflow Default Interval

Example1:

start_date = 2021–01–01 10:00:00
dagrun = start_date + schedule_interval

•schedule_interval = “@month”
first execution = 2021–01–02 00:00:00, wait 14 hr

•schedule_interval = timedelta(days=1)
current start date = 2021–01–02 10:00:00
first execution = 2021–01–01 10:00:00, 2021–01–02 10:00:00

Example2:

start_date = datetime(2019, 10, 13, 15, 50), schedule_interval = 0 * * * * or (@hourly)

Case a) current_time is before start_date — 2019–10–13 00:00, then your dags will schedule at 2019–10–13 16:50, and subsequently every hour.
正確執行時間: execution_date + schedule_interval

Case b) current_time is after start_date — 2019–10–14 00:00, then your dags will schedule at 2019–10–13 16:50, 2019–10–13 17:50, 2019–10–13 18:50 … and subsequently catchup till it reaches 2019–10–13 23:50
Then it will wait for the strike of 2019–10–14 00:50 for the next run.
Please not that the catchup can be avoided by setting catchup=False in dag properties.

補充:

若是想要當天就執行, 建議可以把import airflow
內建的 from airflow.utils.dates import days_ago
將他加入在 start_date: days_ago(1), 把開始時間設為前一天。

Reference

https://forum.astronomer.io/t/airflow-start-date-concepts/393

https://airflow.apache.org/docs/apache-airflow/stable/concepts/scheduler.html

--

--