Backfill Job on Airflow

Abubakar Alaro
Geek Culture
Published in
2 min readFeb 14, 2023
Airflow Backfill

Backfill is the process of running a dag or specific task in a dag for the past days. For example, if a dag has been running since the start of a month and a new task was added to it, and this newly added task needs to be executed for the past days, you need to backfill it.

Outline

  1. What is Backfill Job on Airflow
  2. How to run Backfill on Airflow
  3. Conclusion

I assume that the reader is familiar with Airflow as an orchestrator tool and knows about Dags in Airflow. If not, more information about it can be found here or here

Backfill Job:

Another word to describe backfill is refill. It basically means to redo what has already been done. In airflow, the backfill command will re-run all the instances of the dag_id for all the intervals within the specified start and end date. It is also possible to re-run specific tasks within a dag.

How to run Backfill on Airflow:

The following commands can be used to run a backfill job on a dag with id my_example_dag`:

airflow backfill my_example_dag -s [start_date] -e [end_date]

Note that: this will run all the tasks in the specified dag: my_example_dag

If there is a need to run for specific tasks, the -t flag can be specified. The command will look like this

airflow backfill my_example_dag -s [start_date] -e [end_date] -t [task_id]

In other to execute just the specified task and not run any upstream tasks, the -i flag can be specified. This flag works only when the -t flag is specified

Note that: The dag that needs to be backfilled must be Unpaused/Set to active.

In some cases, Airflow throws a certain exception:

AirflowException("You cannot use the --pickle option when using DAG.cli() method.").

A way to resolve this is to specify the -x or --donot-pickle flag. This tells Airflow to run the command as is rather than relying on the airflow executor to read the state of the dag in the Airflow db. A concept of dagpickle exists in Airflow and it represents a version of a DAG which becomes a source of truth for a BackfillJob execution. Read more here.

Conclusion:
Backfill is a very useful feature in Airflow. It allows you to re-run a dag or specific tasks in a dag for the past days.

References:

--

--