Visualize thousands of dbt models in this 3D DAG viewer
The scale of dbt projects has grown rapidly. According to dbt’s own Tristan Handy, at this year’s Coalesce, 5% of dbt projects have over 5000 models, and an even larger number of projects have between 1000 and 5000 models. Once your project starts reaching such a scale, visualizing it with a regular DAG becomes near impossible. Clearly, a rethink is needed on the best way to visualize this kind of project.
Recently, I found myself trying to decipher similarly large DAGs whilst wearing a Data Engineering hat. So I decided I’d start to build a Large dbt DAG Visualizer using WebGL.
Try for yourself
Feel free to try it out and paste in your own manifest.json file! All code is run locally (messy open code on my replit).
Why do we need a large DAG visualizer?
Since Large dbt DAG Visualizers didn’t exist before, my aim with this was to see what was possible, and also if others would find it useful.
I’ve been using the development of this to explore what features would help me in my day-to-day use. This includes things like
- drilling down into data;
- highlighting upstream/downstream flows;
- identifying data issues.
My experience so far
- It’s been really fun for Exploring Data — especially trying to understand models that I didn’t build (or had forgotten I built 😅)
- Visualizing models with High Centrality — critical models which have a lot of dependencies (children), or depend on a lot (parents)
- Finding a model spatially, and then Drilling down to get more info — Like their Columns and SQL Code
I also wanted to explore how effective this could be for Identifying Issues — So far it’s falling short on that front. I have some basic diffing built in, but am open to other ideas. If you have any ideas, please leave a comment! 🙏
About the visualizer
There’s a bunch of settings, so definitely have a play around:
There’s also a work-in-progress of a more traditional 2D Tree DAG Layout. Though, personally, for pipelines with 100+ models, I find it much easier to spatially orient myself with a 3D Force Graph.
The GitLab benchmark
GitLab has one of most notoriously large public dbt projects, so it’s no surprise that a few people mentioned visualizing it when a colleague shared this on LinkedIn. The manifest file is 50MB and there are thousands of models.
I forked the Large dbt DAG Visualizer and made a GitLab Visualization — just to see what it’d be like.
While it’s certainly possible to visualize (with less visual effects and more optimized code), I found that everything is so interconnected in GitLab’s DAG it can be hard to navigate. I’ll need to think of better ways to navigate such interconnected pipelines 🤔.
If you want to visualize dbt manifests as large as GitLab’s 50MB manifest, let me know!
A work in progress
Try out the Large dbt DAG Visualizer on your own manifest and let me know your thoughts! (Runs locally, and all code is open on my replit.)
This is very much a work-in-progress. There’s so many things I want to improve, would love some help figuring out where to focus 😅.
If you have any ideas, please leave a comment. 😁