Airflow, the easy way

Julien Kervizic
Hacking Analytics
Published in
5 min readDec 1, 2018

--

What is Airflow

Airflow is a data orchestration and scheduling platform, in layman’s its a tool to manage your data-flows and data operations. It enables better management of what would have otherwise have been created through a cron job. Airflow revolves around the concept of directed acyclic graph (DAGs), a collection of tasks that are organized in directional manner handling their dependencies.

Airflow offers a management interfaces showcasing the status of every dag job run, whether it succeeded, failed, running or stuck on a retry mechanism.

It is possible to deep dive into the status of different tasks of the DAG, above for instance is tasks to pull data on sponsored products from Amazon’s Ads API for a few European marketplaces. Each marketplace has its own set of tasks periodically run. Airflow also provides the possibility to get alerted on failure or missed SLA.

Setting up Airflow

Docker Images

--

--

Julien Kervizic
Hacking Analytics

Living at the interstice of business, data and technology | Head of Data at iptiQ by SwissRe | previously at Facebook, Amazon | julienkervizic@gmail.com