Data mesh: a true paradigm shift?

Published in

Balderton

7 min readMar 7, 2023

Data mesh, data governance, data fabric, data access management, lineage, observability, orchestration…. the ‘governance’ layer of the modern data stack has been attracting growing attention and debate (and confusing terminology).

Whilst the concept of data governance is not new, the emergence of data mesh — a federated, decentralised approach to data governance, coined by Zhamak Dehghani in 2019 — is more recent. This has led organisations to rethink their orchestration and observability layers, and inspired a wave of innovation in the space. But is data mesh just another trend, or does it represent a true paradigm shift?

In order to answer this question and identify the most exciting opportunities for investment, let’s start with the basics and break down the fundamentals of data governance. We’ll look at:

Getting the lay of the land: Data mesh and data governance
How data orchestration, observability and access management tie in with the data mesh
Considerations when assessing emerging players

In the piece below, I unpack what’s actually happening, and what it means for investors. Please feel free to share your thoughts and questions in the comments.

1. Getting the lay of the land: Data mesh and data governance

We can think of data governance as a broad umbrella of who should manage access to data pipelines, how they can be monitored, and shared. Data governance simply means setting internal standards on how data should be gathered, stored, processed, and disposed of.

The data mesh is a federated approach within this data governance, focusing on a distributed, decentralised approach to enterprise data management. It sees datasets as federated products, orientated around domains. The idea is that each domain-specific dataset has its own engineers and product owners to manage it. This in turn allows for a level of self-service across an organisation.

With data mesh, a team could be composed of software engineers, analysts, data scientists, all working on building operational and analytical data products.

So data mesh could be interpreted as opposite to current data platforms that are more centralised, and often built around complex pipelines. But whilst this federated approach removes blockers coming with centralised systems, it can still be used alongside traditional storage systems, like data warehouses or data lakes. It simply means that use has shifted from a single, centralised data platform to multiple decentralised data platforms.

Why the growing need? According to studies, only 32% of IT leaders realise tangible value from data, and 77% of them integrate up to 5 different types of data in their data pipelines. 65% of organisations are using at least 10 different data engineering tools. And another 94% of organisations would like to deploy a data catalogue, but only around 1/3rd say that their data catalogue has met their expectations

In this light, we can break down data mesh in 4 main sub-categories:

Domain-driven ownership of data: increasing autonomous nodes on the data mesh
Data as a product: distributing high-quality data in a secure way
Federated computational governance: aggregating data products, with high computational standards
Self-serve data platform: consuming data products autonomously

2. How data orchestration, observability and access management tie in with the data mesh

The data mesh is complex to navigate, so we need specific orchestration and observability players to keep up with the wave of decentralisation that data mesh offers. But what do we mean when we talk about orchestration, observability and access management, and why do they matter?

Data orchestration is the process of taking siloed data from multiple data storage locations, organising it, and making it easily accessible for analysis.

In this sense, we can group lineage under data orchestration — the process of understanding data as it flows through complex pipelines, from sources to consumption.

Companies in this data orchestration category include DBT, and Balderton portfolio company Kili Technology, as well as companies focused on extraction and ingestion like Fivetran and Airbyte. See my colleague Sivesh’ post for visual representations of this value chain.

Examples of key players focused on data orchestration:

Meanwhile, data observability refers to an organisation’s ability to fully understand the health of the data in their system, and works by applying DevOps Observability best practices to eliminate data downtime — such as automated monitoring, alerting. It can be thought of in 5 main sub-categories:

Freshness: How up-to-date are the data tables, and at what cadence are they updated?
Schema: How is data organised and have there been changes?
Volume: Are the data tables complete?
Distribution: Is the data within an accepted range, and can it be trusted?
Lineage: How and why has the data moved over time (upstream and downstream)? Which teams are generating the data and who is accessing it? How is metadata collected?

Examples of key players focused on data observability:

Finally, Data Access Management players help manage access to data, making it more secure, usable and available. Players like Atlan, Alation, Privitar , BigID allow for data cataloguing. Data catalogs are collections of metadata, search tools and data management, to help users find data that they need within an organisation. These businesses are more or less focussed on one of the four data mesh subcategories above - they can provide data discovery tools aimed at helping users understand the context around data; connect to data warehouses and business intelligence tools; update data documentation etc.

See here for a more detailed comparison of feature differences.

Examples of key players focused on data access management:

Therefore, data mesh offers a more agile way of combining observability, orchestration and access management — rather than waiting for the perfect data warehouse or data lake. With data mesh, one can more flexibly adapt to changing data sources and create multiple data products.

In this way, we can think of data mesh as a paradigm shift given its revolutionary decentralised approach — both in terms of team structure, but also the way data is fundamentally stored, accessed and shared.

3. Considerations when assessing emerging players

Now that we have a clear understanding of what makes up the data mesh, we can look into companies innovating in the vertical and where they fit in the paradigm shift.

Can these players exist as stand-alone businesses, or are they merely integrated layers on top of existing data warehouses?

A number of companies are innovating in the orchestration and data access management space in Europe:

Castor (France, Series A)
Raito (Belgium, Seed)
Y42 (Germany, Series A)
Wayfare (Denmark, Pre-Seed)

A number of recent players are also innovating in data observability:

Soda (Belgium, Series A)
Sled (Germany, Pre-Seed)
Sifflet (France, Seed)
Validio (Sweden, Seed)

These companies have different specialities in their offerings, eg: Castor centred on a Notion-like data catalogue with 15-min onboarding; Raito automating data access requests and pipelines with a focus on privacy and security; Y42 automating pipelines with any modelling language of choice; Sifflet focused on metadata monitoring and ML-based anomaly alerting; Soda with an open-source framework and scanning data from a command-line.

In an ecosystem that is already crowded, I believe one of the keys to success for many of these startups will be their ability to identify and leverage their key differentiator: Is it data quality? Security? Administration data preparation for modelling? One good example is Stemma in the US,which focuses on building a self-serve data culture within data orchestration.

I believe the longer-term success of many of these will also depend on their ability to master sales, and effectively partner with larger players (Collibra, MonteCarlo, DBT etc).

4. Final Thoughts

To conclude, data mesh is taking off as a solution to address IT leaders’ most challenging pain-points. While the space is getting increasingly crowded to navigate, players specialised in orchestration, observability and access management are innovating at pace.

Not every company will make it, but success will largely depend on product differentiators, and strength of partnerships and integrations with other DataOps/MLOps larger players.

Stay tuned for my next article on this topic, where we’ll be looking more closely at data lineage, reverse ETL, and the metrics that matter most.

If you are a technical investor, founder, or operator, please feel free to share your thoughts to mwehr@balderton.com and feedback in the comments.

5. Further resources

1. https://preset.io/blog/reshaping-data-engineering/

2. https://www.ibm.com/topics/data-mesh

3. Data Mesh Whitepaper | Starburst

4. https://atlan.com/what-is-data-mesh/

5. https://www.dremio.com/resources/guides/what-is-a-data-mesh/

6. https://www.collibra.com/us/en/blog/data-mesh-101-a-straightforward-overview-of-the-hottest-topic-in-enterprise-data