The Airflow Summit has a very special place in my heart. I hope I can share my sentiment here with whoever reads that, and by the end of the post you will know why I think that Airflow Summit is a fantastic community building and learning opportunity that you might also make very good use of.
Why community conferences are special
I would say I am a long-time veteran of helping to build the community through events. The “virus” of conference organizing has been caught from Wojtek — one of the Polish Autumn foggy days in 2013. I am eternally grateful for his “Hey, let’s make a conference on Mobile — I even have a good name Mobile Central Europe” that he casually teased me with. This was literally the founding moment of the MCE conference which we ran for 5 years at Polidea and which grew from 300 to 600 participants, got numerous workshops, after-parties and became one of the most important international events in mobile space.
Every single event I organized was different. For example, the last MCE event ended up with Robots and VR connected! — with the involvement of NoMagic.ai where I worked then as a robotics engineer (and the coolest part was that I programmed the robot to play it). And while organizing such a big conference is truly exhausting (people often told I looked like a zombie in the weeks before every single MCE conference), it was one of the best things in my community building experiences and I strongly believe you cannot build a strong community without having a good, strong event.
When we decided to stop MCE, that left a huge void in my heart. That was a good decision to stop it (mobile was not as community driven any more at that time), but I’ve felt that something was missing. In the meantime I started to be active (that’s a little understatement though) in the Apache Airflow community. That’s why when Aizhamal wrote on the devlist “Hey let’s make a conference about Airflow ” (coincidentally, it was another foggy day in November 2019), you can imagine what happened next.
The Airflow Summit
Fast-forward to three years later — we are now a few weeks after the 3rd Airflow Summit 2022. And this year, the Summit was also different than the two previous editions. It was an amazing, truly global and distributed event — more than 7000 online participants, more than 430 people attending local events (and actually having after-parties together!) all over the world at all continents — San Francisco, Seattle , New York, São Paulo, Lagos, London, Paris, Warsaw, Tel-Aviv, Bengaluru, Tokyo, Melbourne, Sydney and numerous watch-parties all over the world.
I could write a lot about how cool it is to work with the amazing group of people who organized it together (you all rock!), or to praise Software Guru — our event producers, who are hands-down the best community-focused event organizers I worked with over the many years, or to write how cool it is to get back to meet people at physical events (I managed to be at both London and Warsaw ones) after two years of the pandemics!
But not today — those stories might still come, but today I wanted to share with you the best things you can learn by watching the talks from the Airflow Summit 2020. And if you trust my judgment, I added a short one-sentence statement of learnings you can get by doing it for each of the talks I picked. I put them in the context of the Airflow 2022 video playlist — and you are of course most welcome to watch also those that I did not mention :)
Enough of the intro, it’s not what you came here for.
The learnings from Airflow Summit 2022 !
While preparing this post I realized there are a couple of recurring themes you could see in the Airflow Community. Some of them are because we engineered (yeah! organizing a conference is very much about engineering the experience, if you did not know that) the conference topics, but the actual themes were brought there by our amazing speakers.
How Airflow works
- Learn from the one and only Ash from Astronomer on why and how Dynamic Dags are important and what problems they solve.
- If you ever wondered how to make best use of Kubernetes in Airflow you can definitely learn now by none other than Jed Cunningham from Astronomer
- Learn what is the super-exciting future (some of it already present!) of the Airflow UI by THE humble Brent Bovenzi who leads the UI effort from Astronomer.
- Learn why we developed one of the most resource and cost-saving features we have in Airflow — Deferrables and Async IO, already present in Airflow 2.2 for some time by the creator, and long time Django contributor Andrew Goodwin from Astronomer
- And learn how to actually write more of those deferrables by Ankit Chaurasia from Astronomer
- Learn about the upcoming Multi-Tenancy features of Airflow by myself and Mateusz Henc from Google Cloud
- Learn about all the amazing new features in the biggest release of Airflow 2 — Airflow 2.3 from the legendary Kaxil Naik from Astronomer
- Learn how upcoming System Tests are working under the hood for Airflow by Mateusz Nojek, Bartłomiej Hirsh and Eugene Kosteev from Google Cloud
Make the best use of Airflow
- You can learn about all the many ways you can skip tasks in Airflow (for the benefit of your sanity) by Howie Wang from Apple
- You can find out how you can integrate Jupyter notebooks and data science workflow and implement custom DAG serialization with Airflow by Mocheng Guo from AirBnB
- And how you can author your dags more easily using Astro Python SDK by Daniel Imbermann from Astronomer
- If you want to know how you can make SLA monitoring for your pipelines actually works by Eden Gluska from Turbine
- You can learn about some surprising uses of Airflow — as a runner for the load testing by Doron Cohen from Sparkbeyond
- You can learn how you can create a consistent DAG development environment for your whole team like a Boss by Evgeny Shulman from Databand.ai.
- If you want to architect your DAG code and workflows well and keep them in order you can learn how by Uma Ramadoss
- How to extend your use of Airflow for self-serviced data mesh by Jorrick Sleijster from Adyen
- You can learn how to use Airflow with many other tools at the discussion panel by Brad Kirn, Jitendra Shah, Allessandro Pregnolato, Sarah Johnson
- How you can ingest game telemetry with Airflow in nearly real time by Karthik Kadiam from Warner Bros Games
- Learn why Data Observability and Data Downtime is such a hot topic by Bar Moses — the Monte Carlo founder
- Learn why Data Lineage (and especially Open Lineage) is huge for the future of Airflow by Ross Turk from Astronomer
- And again you can learn how the Data and Open Lineage is applied to Airflow from Open Lineage contributors Maciej Obuchowski and Paweł Leszczyński from GetInData.
- And as a follow up — how you can automate your backfills when data observability, lineage and Marquez are put together by Willy Lulciuc from Astronomer
- Learn how to make pipeline circuit breakers with Airflow by Prateek Chawla from Monte Carlo
Engineering practices in your data pipelines
- Learn how you can Engineer your Data pipelines with thousands of DAGs and 100s of repositories (!) by Anum Sheraz from Jagex.
- Learn why Data Engineering is a thing and why you should transfer best practices to data engineering to make your team happy by Leah Cole from Google Cloud
- Learn why and how you need to plug-in debugging into your modern data pipelines by Francisco Alberini from Monte Carlo
- One more, super cool talk and how you can apply the best engineering practices to your Airflow data pipelines (see the recurring pattern again ?) by Evan Tahler and Marcos Marx from Airbyte
Managing your Airflow at scale and choosing how to deploy it
- Learn how you should approach choosing the right orchestration tool by Parnab Basak from Amazon Web Services
- Learn how to run Multi-Tenant Airflow at scale (even though we do not yet support multi-tenancy) by Sam Wheating and Megan Parker from Shopify
- Learn how to tame your logging configuration for Airflow — super important, often overlooked topic by Philip Gagnon from Astronomer
- Learning you have from running Airflow at Scale by John Jackson from Amazon Web Services
- What lessons from the trenches you can get from running Airflow at scale in the Cloud by Rafał Biegacz and Filip Knapik from Google Cloud
- Running Airflow at scale in a huge enterprise as multi-tenant internal solution by Alaeddine Maaoui and Prekshi Vyas from Société Générale
- How you can solve real-life problems when you run Airflow at huge scale by Ace Haidrey, Yulei Li and Dinghang Yu from Pinterest
- How to make best use of joining the data science /notebook workflows and Airflow at scale with Apache Zeppelin by Jeff Zhang from eBay.
- How you can manage multiple ML models for multiple clients at scale with Airflow by Ori Peri from Riskified
- How you can deploy your Airflow at Scale on Astro platform by Navid Aghdaie from Astronomer
- How you can deploy Airflow internally for multiple repos, projects and teams for Data Science you need at huge scale by Hamed Saljooghunejad and Siraj Malik from PlayStation
- What’s new in Amazon MWAA by John Jackson from Amazon Web Services
How Airflow and generally Open Source Community is great to learn and grow
- Learn how you can get involved and earn your ranks in the Open Source community and grow personally while doing so by one of the once-intern now PMC member of the Apache Airflow Ephraim Anierobi from Astronomer
- Learn about many of the wisdoms to learn when becoming a contributor to Open Source project by Bowrna Prabhakaran — the Outreachy Intern who works on Airflow for quite some time now.
- Learn about huge need for exercising your empathy in the community by myself