Running Airflow on ARM M1/M2? Hell yes, but upgrade to Airflow 2.3+

Jarek Potiuk
Apache Airflow
Published in
10 min readJul 17, 2022

--

Over the last few months I had a lot of questions and discussions at Airflow Slack and in Airflow GitHub Issues about running Airflow on ARM (which was really all about running Airflow on Apple Silicon (M1/M2). Does it work? Can we use it? What if I buy the new Apple Laptop for my users who want to run Airflow locally (mostly for DAG development)?

Does Airflow support ARM/M1?

In short, the answer is:

ARM/M1/M2+ Airflow = Love (if you are using Airflow 2.3+ that is) — image credit Apple and ARM.

Yes — as of Airflow 2.3.0 there is full development support for ARM devices. The whole development toolchain supports it, and we even have experimental support for running Airflow in Production on ARM. We publish reference, multi-platform images for both ARM and AMD64 platforms.

First things first however. One takeaway from this article that you should remember — If you have not upgraded to Airflow 2.3+ yet and there is even slight change your team has Mac M1/M2 — upgrade NOW!

Small caveat is that it only natively supports Postgres and Sqlite as metadata DB as the ARM support of MySQL and MsSQL is not yet great. But according to our 2022 survey — Postgres is about 80% of our installation base, so if you are still using MySQL or thinking about using MsSQL — well, guess what ?

For those who were previously complaining about Airflow being unstable and slow — yeah, it used to be like that, but now you can reap the benefits of developing Airflow and Airflow DAGs on your brand new MacBook.

Is that it ? Is this the end of the article?

Well, It could be. If you want to just focus on your DAG development on your MacBook you can finish reading now and go do it, but I think it’s worth to understand a bit how it came to that and what it really means, and what you miss if you have not yet upgraded to 2.3+ and want to make your Airflow development experience, not only better but actually “possible”.

A bit of history

ARM is a very interesting company. Not everyone knows that unlike it is with Intel or AMD CPUS, ARM does not develop their own processors. Yep. You heard it right. ARM just designs processors (and generally other chips and system-on-chip (SOC) architectures). It sells the licenses to produce the processors to others.

And it’s a pretty successful model. You might not be aware but almost for sure you have an ARM system in your very pocket right now — virtually all phones run on ARM-based processors and SOC licensed by ARM.

And up until very recently the footprint of ARM in “bigger” devices or on servers was rather small. There were a very few devices and attempts of a number of producers to use ARM in Laptops, Amazon added their Graviton series of servers in 2018/2019 but for all practical purposes — personal computing and servers were running on x86 architecture (Intel/AMD).

Enter 2020

This was — as some say — a year to remember. For a number of reasons of course, you still can think of many things as “before 2020” and “after 2020”. But there is one reason 2020 was a turn of a tide for the ARM architecture as well. Apple finally released the long awaited line of their iconing MacBooks (both Air and Pro) to use ARM. This has been long anticipated, and coming but in November 2020 it finally happened.

When I gave my talk about Production Docker Image at Airflow Summit 2020 — more than 2 years ago and few months before the ARM MacBooks were released, I knew this is going to be next frontier for the image (as you might see the talk was delivered straight from my home office. The “ARM support might be the big one” was the last thing I had to say there.

Screenshot from “Production Docker Image” talk at Airflow Summit 2020

What happened in 2021 for Airflow

Not much on the surface. If you wanted to use Airflow on ARM in 2021 you’d be out of options.

As initially suspected, the transition to ARM images took quite a lot of time for Airflow. Why? The answer is rather simple if you understand how the modern supply chain for open-source software works. For those of you who do not know, it looks like that:

Credits — Randall Munroe https://xkcd.com/2347/

Airflow is very much on top of the picture there. We have more than 600(!) open-source dependencies that make it possible to build the most popular truly Open-Source workflow orchestrator. Managing all the dependencies is a challenge on its own which I gave the whole talk about recently (and not the least problem is that Airflow is both a library and application at the same time). But the problem with Airflow is that in order to support ARM, ALL of the 600 dependencies must have support it. And many of those dependencies are transitive — so our direct dependencies depend on others, they further depend on more dependencies and so on — down to operating system and kernel. And all those dependencies are (as you can see above) thanklessly maintained by some people in the open-source.

Each of the maintainers (who mostly are not directly paid for their work but this is a subject of another, future post) needs time and effort to catch up. When you are developing a library that is a base for tens of thousands or hundreds of thousands other projects that billions of people in the world use, you better do any such change carefully. You need to test all the edge cases, run a CI pipeline on ARM devices, go through a few rounds of release candidates and likely fix some teething problems reported by your “eager” users before you can announce a full support. And that means also that their dependencies need to get the CI support, and CI infrastructure providers need to provide free ARM support (still not there by many) and that the maintainers themselves have a way to test it on ARM hardware (more on it below).

Here is a very, very small fragment of our dependency chain to illustrate it:

A very small fragment of our dependency chain

This is the one reason our Production support for ARM is still “experimental” — we wait for our users to try it out and report teething problems, our CI is — for now- only limited to making sure the Docker images for ARM are building properly but when we start seeing an interested in “Server” support we will also have to enable running tests. Many of dependencies of airflow like “numpy”, “pandas”, “scikit” are huge part of any data processing toolchain and they are developed with performance in mind — so they have to be compiled to platform-specific libraries.

So the ARM support has to “bubble up” in the supply chain. And it takes many months.

Therefore in 2021 I was just observing and trying out how many of our dependencies still need to upgrade, and more importantly — what are the versions of those dependencies we need to support in order to support ARM. Many of the dependencies released ARM-supported versions only in the newest version of their libraries (for very good reasons). For us this means that we had to even work with maintainers of some of our important dependencies to encourage and help them to migrate to those newer dependencies as well (thanks to Flask Application Builder, Snowflake, Apache Beam, Google teams for cooperating on that especially).

How bad was the ARM experience?

Bad. Running Airflow was next to useless on M1.

One thing that also happened — I decided to buy a second generation MacBook M1 to just “feel the pain” and be able to test the future ARM support for MacBook users. The 2nd generation MacBook pro seemed like a great option — Apple finally reverted the TouchBar/no MagSafe/Bad Keyboard/Lack of HDMI decisions (all of them very wrong IMHO). There were other factors involved, some of them tax-related (but let’s not talk about it). Sounded like a great option. I knew at the beginning it would not be “usable” for my Airflow work. But I did not expect what I got. It was unusable. It was so unusable, that for a few months (until I added ARM support for Airflow) I barely used the brand new MacBook at all.

This experience was confirmed and very nicely summarized a few months later when I helped my friend Szymon Nieradka with his Airflow endeavors. Szymon is one of the best Project Managers I worked with in the past (hands down) and he is one of those PMs who have a strong technical background and he is not at all afraid to “do stuff” when this is necessary. So this is what he did on his brand new MacBook in mid-2022 when he tried to use Airflow 2.2.4 (this is the version, the company he managed a huge project for at this time).

Then we had a What’s App conversation where he complained why Airflow is so slow.

Screenshot of What’s App conversation with my friend

Sorry for Polish, but for those who do not read Polish — my “free translation”:

Szymon:

  • Airflow 2.2.4 “eats up” 100% of 8 cores
  • Airflow 2.3.2 “eats up” 50 % of 1 core
  • So I got 16 x boost (not just 10x that was promised earlier in the conversation — translator’s comment)

Jarek:

  • I told you, your jaw will drop (few floors down — translator’s free interpretation :) )

You might not find it surprising that the company now migrated to 2.3.3 and this was one of the big reasons for this migration.

There comes 2022

The 2022 was another year where a lot of things happened, and this time it was purely man-made, rather than natural phenomena) but for Airflow this was a year ARM support went mainstream. And in April when 2.3.0 was released, my MacBook was pretty much my main development platform for the past few months — all the dependencies that were important to us already caught up. There were a number of people who used it for their “main ‘’ development as well, and all the initial teething problems were already addressed. We managed to migrate to the latest versions of those dependencies and we built the whole CI and development environment we have (Breeze, the development environment I developed for Airflow) was prepared for regular ARM releases and daily use. We converted our build toolchain to use buildx that allowed us to build multi-platform images.

So we could officially announce ARM support for Airflow 2.3.0:

Docker Hub multi-platform image of Airflow

Looking forward to what the future brings

We have a stable CI for development, experimental support for Production running. What’s next?

First thing first — If you are reading this, you are undoubtedly interested in running Airflow on ARM. If you are not on Airflow 2.3+ yet — MIGRATE NOW.

You might think it is not necessary — every now and then I got question — how can I improve my Airflow 2.1 or 2.2 experience on M1? I recently even had a conversation on Slack with a person who was dedicated by their company to make their Airflow 2.2.4 they are running ARM compatible.

The answer is simple. DON’t. It is more effort than to migrate to Airflow 2.3.

Migrate NOW. This will be far simpler than making Airflow 2.2 or before compatible with ARM. I know what I am saying. I’ve been working with Airflow dependencies for more than 4 years, and I can assure you the amount of work that you’d have to do is vast. And not only for Airflow — you would likely have to fork Airflow and make it use some of the newer dependencies, and likely you won’t avoid having to fork and fix some of the dependencies. And if you are using just a small subset of those — you might even succeed, but it will very likely block you from using numpy, pandas, scikit and many other libraries that depend on ARM.

And it’s just much more straightforward to migrate to Airflow 2.3. Airflow 2 follows SemVer rigorously. Not only All Airflow 2 releases are backwards compatible, in Airflow 2.3 we introduced easy ways to move back/forth between the versions if you find that there are some — earlier undetected — errors that prevent you from migration. It should be safe and painless to migrate.

Then when most of our users will migrate — I hope it will be smooth sailing.

Google JUST (4 days ago) announced ARM support for GKE on Google Compute Platform (https://cloud.google.com/blog/products/containers-kubernetes/gke-supports-new-arm-based-tau-t2a-vms) so — inevitably — the server side of ARM vs Intel/AMD race is just beginning to heat up. You can expect many more announcements in the coming months in that space. And … we are pretty much ready for the revolution there. Our production image already (experimentally) supports ARM. But all the heavy lifting has already happened. The only thing that needs to happen now is that we need to make our CI builds run all the test harness for ARM images. We can do it even now, but we just wait a bit when we will start getting questions and feedback about it, as it requires considerable “infrastructure” cost increase (luckily we have some sponsors that make it possible to run our CI workflows).

So — if you want to use ARM and Airflow together — give it a spin. It’s there, waiting for you (after you migrate to 2.3+ that is).

--

--