Apache® Airflow™ announced as a Top-Level Project — How does ING WB Advanced Analytics use Airflow?

Nikoletta Bozika
inganalytics.com/inganalytics
3 min readJan 8, 2019

--

Today the Apache Software Foundation (ASF), announced Apache® Airflow as a Top-Level Project (TLP). Apache Airflow is a flexible, scalable workflow automation and scheduling system for authoring and managing Big Data processing pipelines of hundreds of petabytes and is in use at more than 200 organizations*. At ING WB Advanced Analytics we use Apache Airflow for our projects’ workflow orchestration and at the same time contribute back to the project.

Apache Airflow is used to easily orchestrate complex computational workflows. Through smart scheduling, database and dependency management, error handling and logging, Airflow automates resource management, from single servers to large-scale clusters.

“Since its inception, Apache Airflow has quickly become the de-facto standard for workflow orchestration. Airflow has gained adoption among developers and data scientists alike thanks to its focus on configuration-as-code. That has gained us a community during incubation at the ASF that not only uses Apache Airflow but also contributes back. This reflects Airflow’s ease of use, scalability, and power of our diverse community; that it is embraced by enterprises and start-ups alike, allows us to now graduate to a Top-Level Project.” — Bolke de Bruin, Head of IT at ING WBAA, Vice President of Apache Airflow.

Written in Python, the project is highly extensible and able to run tasks written in other languages, allowing integration with commonly used architectures and projects such as AWS S3, Docker, Apache Hadoop HDFS, Apache Hive, Kubernetes, MySQL, Postgres, Apache Zeppelin, and more. Airflow originated at Airbnb in 2014 and was submitted to the Apache Incubator March 2016.

“At ING WBAA, we use Apache Airflow to orchestrate our core processes, transforming billions of records from across the globe each day. Its feature set, open source heritage and extensibility make it well suited to coordinate the wide variety of batch processes we operate, including ETL workflows, model training, integration scripting, data integrity testing, and alerting. We have played an active role in Airflow development from the onset, having submitted hundreds of pull requests to ensure that the community benefits from the Airflow improvements created at ING. We are delighted to see Airflow graduate from the Apache incubator, and look forward to seeing where this exciting project will be taken in future!” — Rob Keevil, Data Analytics Platform Lead at ING WBAA

“We rely on Apache Airflow for all our batch data ingestion, making it easy to schedule jobs and retrace our steps.” — Niels Denissen, Data Engineer at ING WBAA

Apache Airflow is developed by The Apache Software Foundation (ASF; http://apache.org), the world’s largest Open Source foundation. Since 1999 the ASF has been developing, shepherding, and incubating 300+ freely-available, enterprise-grade projects that serve as the backbone for some of the most visible and widely used applications in computing today. Through the ASF’s meritocratic process known as “The Apache Way,” more than 730 individual volunteer Members and 7,000+ code Committers across six continents successfully collaborate on innovations in Artificial Intelligence and Deep Learning, Big Data, Build Management, Cloud Computing, Content Management, DevOps, IoT and Edge Computing, Mobile, Servers, and Web Frameworks, among other categories.

Please find the official Apache Airflow press release here: (https://s.apache.org/1p5z)

*A list of Apache Airflow known users can be found at https://github.com/apache/incubator-airflow#who-uses-apache-airflow

--

--