5 considerations to have when using Airflow

Julien Kervizic
Hacking Analytics
Published in
6 min readMay 14, 2019

--

In previous posts, I have explained the basics of Airflow and how to set up Airflow on azure. I haven’t, however, covered what considerations we should give when using Airflow.

I see five primary considerations to have when using Airflow:

  • What type of infrastructure to set up to support it
  • What kind of operator model to abide by, and which operators to choose
  • How to architect your different DAGs and setup your tasks
  • Whether to leverage templated code or not
  • Whether and how to use it’s REST API

These considerations will dictate how you and your team will be using Airflow and how it will be managed.

(1) Airflow Infrastructure — Go for a Managed Service if Possible

Setting up and maintaining Airflow isn’t so easy if you need to set it up, you will most likely need quite a bit more than the base image:

  • Encryption needs to be set up to safely store secrets and credentials
  • Setting up an authorization layer, if only through the flask login setup and preferably through an oAuth2 provider such as google
  • SSL needs to be configured

--

--

Julien Kervizic
Hacking Analytics

Living at the interstice of business, data and technology | Head of Data at iptiQ by SwissRe | previously at Facebook, Amazon | julienkervizic@gmail.com