Data Mesh, Explained As a Three-Ring Circus

Alex Gnibus
6 min readJan 2, 2024
Photo by Ingo Ellerbusch on Unsplash

Is it an architecture? Is it a technology solution? A social construct? Data mesh is a little bit of all of the above, causing much befuddlement in the data industry as we all try to get up to speed on what the heck it is.

If you’re reading this, chances are you’ve been befuddled by data mesh before. You’ve likely already read other explainers that didn’t simplify it enough, making data mesh more mysterious than it needs to be.

I’ve been down that road. And since I couldn’t find a simple explainer, I figured I’d write it.

So here’s my take on data mesh. We’re going to the circus.

A three-ring circus is a useful analogy for data mesh, because the circus shares a common goal with data mesh: organizing the chaos of multiple acts happening at the same time.

The challenge: Putting the spotlight on the performers (the business domain experts)

When it comes to getting value from data, who should be the star of the show? The people who know the data best.

Many agree that data analytics should be owned by the functional line-of-business experts who understand the data and the use case. You’ve heard it before: Make the right data available, to the right people, at the right time.

But that’s easier said than done, right? Especially as our data sources grow, use cases become more diverse and the systems we use to work with data become more technical.

Data mesh is a framework for solving this challenge. The creator of the concept, Zhamak Dehghani, has called it a “sociotechnical approach.” It’s a way of thinking about the technology architecture choices that will best spotlight your performers.

In other words, it’s like designing a good circus!

Below are the four key concepts of data mesh that I’ll walk through as we build our circus:

  • Domain-oriented data ownership and architecture
  • Data as a product
  • Federated computational governance
  • Self-serve data infrastructure as a platform

A three-ring circus: Domain-oriented ownership and architecture

Imagine all of your circus acts going on in just one ring. You’ve got the trapeze artists, jugglers, dancers and acrobats all crowded in the same place. It might work while your circus is small, your acts are simple, and you’re just getting your show off the ground.

But as you get more performers and add more sophisticated acts, it becomes utter chaos. And there’s just one ringleader running the show.

In the data world, that’s like having all of your different data in a monolithic data lake architecture, with a single team of technical data experts managing it all.

A three-ring circus is when you have three simultaneous performances happening at a time. Much like multiple departments in a business simultaneously using data. And in this circus, each of the rings run their own show.

That’s data mesh.

In a data mesh culture, each ring of the circus is self-service. The performers each get their own resources. The domain experts own their data pipelines — from ingesting the data, to processing it, to analyzing it and creating the final datasets and insights.

Get ready for my Google Drawings interpretation:

Each ring puts on its own act — AKA, the data pipeline. Illustrated (poorly) by the author.

Sharing between rings: Data as a product

Imagine that a performer in the acrobat ring wants to borrow trapeze equipment from the performers in the aerial ring. Or the jugglers want to use the same music for their routine as the contortionists.

Similarly, your operations team may want to use a data asset from the finance team. To make the data shareable, you need it to be discoverable, trustworthy, self-describing, addressable and interoperable. In other words, it needs to be treated as a product.

This concept is “data as a product,” and it’s important for data mesh because it’s the “mesh” part: the connections between domains.

The “product” can be a variety of things that are useful for the end consumer — it could be a table of customer records that’s been cleansed and transformed, ready for someone to pop into a dashboard. It could be metadata. Or code. Or dashboards.

In a data mesh culture, the domain experts provide the data assets that will ultimately be consumed by other domains across the enterprise. The goal is data sharing at scale — which is becoming increasingly important as we do things like train machine learning models across multiple dimensions.

Each ring can share with other rings.

Under the big top: Federated computational governance

Even though different performers are in their own rings, all of the rings are still under the same tent!

A decentralized data mesh doesn’t mean rebuilding silos or throwing out the benefits of centralized governance, security and access. It means there’s a shared responsibility between the domains and the central IT team for handling the various aspects of governance: Security, data quality, regulatory compliance, availability, and more.

For instance, IT might own data lineage and authentication, while domain experts own data quality and classification.

You can use a data warehouse, data lake and/or data lakehouse as the technological foundation of your data mesh. Technologies like the Snowflake Data Cloud, Databricks Lakehouse, Microsoft Fabric and more can promote a data mesh culture because they facilitate self-service access to storage and compute resources, while providing centralized governance.

A good example is Databricks Unity Catalog, which provides centralized governance and data discovery for each Databricks Workspace (a Databricks environment for a set of users, like a ring of the circus).

Everything goes under the big top. Or, you know, a lakehouse.

Your performers need a stage: Self-service infrastructure

While modern cloud infrastructure has benefits, it’s also not designed for the business user. Operating a modern data stack requires specific technical skills, such as coding languages.

So you need an accessible self-service platform that the business experts can use for their data pipeline, with tools for storage and access, compute, transformation, machine learning and visualization. Ideally one with a visual, no-code UI (my Alteryx enthusiasm might be showing here).

Think of this platform as a literal platform: the stage under the big top tent.

No one wants an empty stage, but that’s what will happen if the performers don’t like it. And then you’ve got what I call an unusable data stack.

So whichever platform you choose, make sure it supports what each domain needs. This stage needs to be big enough to scale with the number of performers, accessible enough for every performer to use it, and flexible enough to adapt to different acts.

Once your performers have a stage to perform on, your circus is complete!

The finale

Building a domain-driven data culture can feel like a high-wire act, balancing business needs with IT needs. You might be tempted to see data mesh as your safety net, with reassurance that your data problems will be solved.

But data mesh implementations often fail (and some analysts are already proclaiming data mesh a dying trend) because people expect too much of data mesh. It’s not one technology or cure-all. It’s not the big top tent or the flying trapeze or the stage. Data mesh is a big-picture guide, and its value comes from using it as a reference when you’re making decisions about your people and technology.

Whether you’re the business analyst in the ring, or the engineer in charge of setting up the big top, data mesh can be a helpful framework to keep in mind as you run your circus.

I’ll call it a win if this explainer left you less befuddled about data mesh than before. Let me know what you took away!

’Til next time,

Alex

--

--

Alex Gnibus

Word nerd in tech | Alteryx enthusiast | Analytical acrobat