Making the Airflow web UI faster

Robbert
datamindedbe
Published in
5 min readDec 22, 2023

It’s a nice sunny Wednesday afternoon in Belgium, and my colleague Jan and I decide to go for a walk in the cozy city of Leuven. As we stroll over the cobblestones, he tells me he’s building a terminal UI for Airflow. First of all because he loves Rust and it’s a good excuse to try out https://ratatui.rs/ . Secondly because Conveyor’s built-in Airflow web interface felt slow, while the API behind it is fast.

But if the API is fast, instead of a TUI, can’t we build a web UI that’s fast? Maybe a good excuse to try https://svelte.dev/ ? It didn’t take long before I had npm create svelte@latest airblast. I dove right in, but after dabbling a little bit in Svelte, I took a step back. What do we have to beat? How slow is Airflow’s web interface?

Attribution: Dall-E 3 — “a very fast spinning air turbine”

Measuring Airflow speed

So I went looking for and found a nice, 300-line Airflow docker-compose.yml , which spins up Airflow with postgres and redis in no time. So I docker-compose up, open up localhost:8080, airflow airflow for username and password, click around a little bit and… it’s fast. Fire up the developer tools, check the network tab and hit reload: page load in about 1500ms. Not very scientific of me, but if we look at a specific asset, say jquery-latest.js, we only had to wait 22ms for the server to respond.

Local Airflow: 22ms for jquery-latest.js

Versus how much on Conveyor’s Airflow?

Conveyor’s Airflow: 367ms for jquery-latest.js

On Conveyor, fetching this specific jquery-latest.js asset takes 367ms, which is substantially longer! Sure, those packets need to cross some actual copper wires, but it still takes a long time.

If you ever decide to go on a journey like this yourself, be warned that your browser will try to cache some files. Use the Disable Cache flag and read about HTTP 304 (file not changed) and ETags. In our case, a 304 took about the same time as a 200.

Static files on a CDN? Watch out!

Files like this jquery-latest.js are static. They don’t depend on the user that logged in, the DAGs you’re viewing, or anything really. So if we know the request wants to fetch one of these static files, can’t we catch the request early and serve the file already? Usually, you’d use a Content Delivery Network (CDN) for that, like AWS Cloudfront or Cloudflare CDN.

However, you have to think about what happens with updates. When Airflow ships a new jquery-latest.js with the exact same filename in a new version and we upgrade, we risk serving the old static files through the CDN, but the new dynamic files from the new Airflow. That might break some things. To mitigate this, you could invalidate the CDN’s cache when you upgrade. Some applications also include a hash in the filename that depends on the content, like `moment.0fcb6b41ff6a87cf079e.js`, which also solves the problem.

Applying to Conveyor

So I go to my colleague Stijn, who works on Conveyor, and ask him whether we can serve some of the Airflow UI’s static files through a CDN. He explains me these requests go through multiple proxies, draws a nice diagram so even I could understand it:

Can we shortcut the request path using a CDN?

Conveyor’s architecture means that we can have multiple Airflow versions running at the same time, so we can’t just put a CDN in front of everything. Instead, we can serve the static files ourselves, from a static file server, right besides the pods running Airflow. That means the packets still have to travel quite far, but at least we reduce the load on the Airflow server.

Static files on a proxy

So let’s spin up a proxy that serves the static files, and routes the request through to the container if it’s not static.

Enter Caddy. You can think of it as nginx but written in Go. It can do exactly what we’re looking for if you configure it with the following “Caddyfile”:

:8000 {
encode gzip zstd
@appbuilder {
path_regexp ^/environments/[a-zA-Z0–9]+/airflow/static/appbuilder/.*
}
handle @appbuilder {
header Cache-Control "no-cache"
uri path_regexp /environments/[a-zA-Z0–9]+/airflow/static/appbuilder/ /appbuilder/
file_server {
root /app/static
precompressed gzip
}
}
@dist {
path_regexp ^/environments/[a-zA-Z0–9]+/airflow/static/dist/.*
}
handle @dist {
header Cache-Control "no-cache"
uri path_regexp /environments/[a-zA-Z0–9]+/airflow/static/dist/ /dist/
file_server {
root /app/static
precompressed gzip
}
}
reverse_proxy * localhost:8080
}

Which catches requests for the appbuilder and dist folders with static files, while the rest is served from the Airflow container itself. You can then put Caddy in a container, copy over some files from Airflow to serve statically, and in this case, even zip them beforehand.

That results in the following Dockerfile, with some neat --link tricks from Niels included.

# syntax=docker/dockerfile:1.4
FROM apache/airflow:2.6.3-python3.11 as airflow
FROM ubuntu as zipped
COPY --from=airflow /home/airflow/.local/lib/python3.11/site-packages/flask_appbuilder/static/appbuilder /app/static/appbuilder
COPY --from=airflow /home/airflow/.local/lib/python3.11/site-packages/airflow/www/static/dist /app/static/dist
RUN gzip --keep -r /app/static
FROM caddy:2.7.4-alpine
COPY --link --from=zipped /app/static /app/static
COPY Caddyfile /etc/caddy/Caddyfile

So now, if you upgrade Airflow, all you have to do is to not forget to also adapt your Airflow image in the Dockerfile above so that Caddy also serves the new files. No cache invalidation or file hashes needed, just bring down and spin up both containers at the same time.

If you deploy those in a pod on Kubernetes, that might look something like this in a simplified diagram:

Caddy serves the requests for static files, but relays requests for dynamic files.

Caddy sped up our static file requests by ~100ms, and perhaps more importantly, it freed up resources on Airflow for handling API requests.

Conclusion

You can use this reverse proxy trick for any application to serve static files faster and reduce load on the application. Whether it’s worth it depends on the application, but if you can’t use a CDN, serving your static files with a proxy might still give you the performance boost you’re looking for.

Thanks to my colleague Stijn De Haes for the collaboration on deep-diving in Airflow’s web UI and for proofreading. Thanks to Niels Claeys for proofreading. Thanks to Jan Vanbuel for sparking the idea during the nice cobblestone walks.

--

--