Well, here’s another nice mesh you’ve gotten me into

12 min readSep 20, 2021

Why consider the data mesh?

I ‘got into’ data in the early 1990s and have spent the vast part of my career as an information architect, building real-world stuff, as a solution architect or product architect. This was for both operational and analytic use cases, using both centralised and distributed data architectures. I’ve also spent time in data governance and strategy, especially around modelling data at the business level.

So for each new round of thinking in the data space, rather than jump on the latest bandwagon, I’ve tried to understand the real-world challenges which triggered the approach, how the new ideas might actually solve those challenges, and what the upsides and downsides might be.

Most recently, I was intrigued when the idea of a ‘data mesh’ started moving from concept into reality in early adopters (even if some of those adopters may not have called it that when they started). What does it bring to the party? How does it ‘mesh’ (sorry) with what I’ve seen in the leading data organisations I’ve worked with, both previously and while I’ve been at Snowflake?

The problems data mesh tries to address have significant organisational, logical, and technical components. There is a risk that, as often happens with new ideas, the harder organisational and logical issues will be ducked in favour of buying new tech or hiring new consultants to ‘sort’ things. (In which case, expect to see tech vendors or consultants realign their marketing to play the same game.) We need to tackle the harder challenges, without neglecting to consider which technologies might best enable the vision, by providing facilities which actually exist and don’t cost a relative fortune to build or run.

I can see at least three possible approaches and outcomes for data mesh in an organisation:

Data mesh is used to justify the existing data silos, with a bit of technology, such as virtualisation, thrown over the top to ‘create a mesh’. Senior management are reassured that they are ‘on message’ with new data thinking and hard questions are avoided for another few years. Users see marginal benefit.
At the other extreme, data mesh becomes another technically-led white elephant project (think enterprise-wide 3NF models in the 1990s, Hadoop lakes in the 2010s), where the provision of a complex ‘complete’ framework becomes the primary focus of very clever people for several years, before benefits for users and the business start to flow (if ever).
The data mesh initiative causes the participants to consider whether the problems outlined in the data mesh manifesto apply to their organisation, whether the identified causes are in fact true in their case, and what the business value will be of addressing those issues, using data mesh or another approach. That new approach is tackled by a combination of organisational and logical improvements, underpinned by appropriate tech. Users and the business see direct benefits, delivered incrementally.

Option 3 is not just the best outcome, but a feasible one, and I happen to believe that Snowflake is a particularly good platform on which to implement data mesh principles. But I’d like to start by looking at data mesh as I understand it, where I agree with or question its assumptions, and some other issues I believe need more consideration for its rounded application.

The Mesh Manifesto…

The Data Mesh concept in its current form was outlined by Zhamak Dehghani in articles at https://martinfowler.com/articles/data-monolith-to-mesh.html and https://martinfowler.com/articles/data-mesh-principles.html, with a book also forthcoming. There are several excellent responses out there on the Web, and I’d especially recommend those from people who have implemented similar ideas on the ground, using a variety of technical approaches.

Here’s my interpretation of the key challenges and causes which are proposed by the Data Mesh manifesto:

There has been a great divide between operational data - where domain-based, microservice thinking has become key - and analytic data solutions - where a centralized, monolithic model remains.
Moving data between the operational and analytical system(s) via ETL is fragile, due to this monolithic approach, but also an explosion in the number of source systems, and a lack of SLAs from those operational systems.
The monolithic approach leads to bottlenecks, due to governance and engineering processes which have little domain understanding and attempt to make ‘one size fits all’. Tools such as MDM have also acted as a bottleneck, not an enabler.
The data warehouse exemplifies the monolithic approach, but data lakes have in practice been no better due to their complexity, leading to them being managed by highly skilled gurus. Current cloud data platforms are also monolithic.

The manifesto proposes an approach drawing on the services paradigm of operational systems to enable analytics to be broken down. The key principles of this approach are:

Domain-oriented ownership: This is more than a logical structuring, and proposes that organisation and engineering be aligned to the domains. They should also be provided with domain-oriented data storage and compute. Different types of domains are recognised - source-oriented, aggregate and consumer-focused.
Data as a product: The internal workings of a domain remain under its own control, but the interfaces between domains are data products. To be useful, the data made available to other domains should be discoverable, addressable, trustworthy, self-describing, interoperable and secure. (This list is similar to the ‘FAIR’ principles discussed for example in life-sciences.)
Self-service infrastructure as a platform: For this domain-based approach to work, and in order remove the dependence on centralised gurus, the underlying data platform or infrastructure needs to provide a very broad set of common, easy-to-use capabilities and tools, ranging from storage to access control to lineage to cataloguing.
Federated governance: This new approach should not be a free-for-all but needs cross-organisation governance to allow autonomy whilst ensuring that the scope of domains is agreed, and common technical standards, compliance and security are maintained.

Does it make sense, and will it work?

First, the four key principles make absolute sense, even if implementing them is challenging in practice. Many leading data organisations have been trying to work this way for up to a decade, and what data mesh is offering is one articulated way of achieving those principles.

Second, let’s look at some of the assumptions on which the data mesh is based:

I’m not sure that micro-service-based applications are as ubiquitous as implied by Dehghani, though an operational stack based on an assembly of online services is becoming more common. When it comes to data-oriented operational applications, there are still a lot more traditional OLTP and NoSQL solutions out there, for good reasons. However, many operational use cases are indeed owned by teams with a limited interest in the wider downstream use of their data. In leading data organisations by contrast, the implications of that downstream data flow are understood, at least in principle, by those generating the data.

In both traditional IT shops but also in some newer ‘data engineering heavy’ organisations, a centralised funnel or a culture of ‘expert gurus’ managing complex technology can slow down innovation. There is sometimes a fine line between adequate cross-organisation governance and project-stopping bureaucracy. However, leading organisations have been adopting ‘agile BI’ practices for around ten years, working on domain-scoped projects and product-based thinking, so there is already plenty of existing alternative experience out there, implemented on a range of technologies.

To pin most of the blame for ETL complexity on monolithic platforms would be unfair in my view. True, running any data project without the involvement of those who understand the data is a recipe for disaster. But the ugliest, most fragile ETL I’ve seen in BI arises from data silos at the source end feeding other data silos in the middle, feeding further silos at the target end, with lineage uncertain, and a dangerous, untested reliance downstream on the meaning and accuracy of previous transformations, or even on components which might disappear overnight. One of the bottlenecks of monolithic approaches is precisely the governance intended to avoid this mess, though the inertia of these approaches, and the limitations on the volume and range of data supported by their underlying platforms, have led to shadow solutions which in turn led to the ETL spaghetti.

A significant issue is the extent to which ETL tools themselves became the unintended ‘owners’ of data products. In the era of constrained, relational-only data warehouses, incoming data had to be transformed and reduced in size before landing. With early data lakes the management capabilities were so appallingly limited that ETL tools often took on the role of guaranteeing any level of ‘data goodness’.

An assertion that modern cloud data platforms are (all) monolithic is one I would heavily challenge. Sure, if you just port on-prem tech to the cloud you also port its constraints. And of course, you can implement a ‘monolithic’ approach on top of a platform which actually offers a wider range of options. But for me, one of the main attractions of a platform like Snowflake is precisely the flexibility it offers in implementation approaches - the benefits of a single platform without the traditional constraints on compute or storage or the number of different workloads or domains which can be independently handled.

Third, what about the areas under-represented so far in exploring data mesh concepts?

Unlike the well-crafted, low-volume-per-query data exchanges of operational systems, analytic systems often work in millions or billions of rows - and generally provide the most value when combining such data sets from different domains, often in new and even unexpected ways. So prebuilt ‘microservices’ are unlikely to address the breadth of use cases, and could in fact act as constraints to innovation. Data has value when exposed, not when encapsulated or hidden.

Implementing a mesh-like approach will not be simple or low-cost. There are some serious governance questions, for example of ownership and responsibility - the necessity for ‘aggregate domains’ in the manifesto hints at the fun to come. Consumers may find it more complex to build their own solutions from a menu of data products. Resourcing domain-dedicated, but centrally-coordinated, design and engineering teams may require more people, at least initially.

There are also technical challenges - the two biggest being security and performance:

Security is a topic I’ve not seen well-addressed in relation to data mesh. Security requirements are very different in analytic solutions from operational ones. There is a need to balance flexible access with demonstrable compliance, and data product creators may need to know how, and by whom, their products are being consumed downstream, especially if they contain PII or are otherwise sensitive. This type of broad security is typically a strength of database-type solutions, particularly inside organisations. Beyond the organisation boundaries, it becomes a little harder; hence the recent growing interest in data cleanroom solutions - which should ideally still be part of the integrated data platform.

When it came to first-generation data lakes, the problems of complexity were overshadowed by the dire security, leading to them being heavily firewalled. (I still remember a big-data lead telling me proudly that normal mortals would not be allowed near his lake…). Data was often replicated repeatedly in different slices for different audiences, due to the lack of basic granular controls - that copying itself being a compliance nightmare. File-level security is still a common approach, and is still not good enough.

That issue of security also rears its head in any distributed data architecture - how to ensure rules are consistently applied across silos. If you insist that, for compliant sharing, everyone must have their access funnelled via one front end tool, you have just provided a new logical and physical bottleneck into your supposedly independent systems, with the domain owners now needing to understand two security paradigms.

Performance is a key challenge of using distributed architectures to manage data at analytic scale. What to do if the two separate million-row data products you wish to join each sit in their own physical stores, different to yours? If you copy the data permanently, the result is duplicate data in multiple silos. On the other hand, if you virtualise dynamically, the risk is moving huge volumes of data around repeatedly, with poor query performance. Ideally you want the benefits of a single optimized platform, where data can be joined efficiently across domains which are nevertheless protected by appropriate logical boundaries.

It’s worth drilling a bit more into virtualization, which has been suggested as a technical approach to mesh. It’s an idea which goes back at least 25 years. I was working with Sybase’s Omni SQL back in the 1990s, and the idea returned repeatedly, for example Composite and Denodo in the 2000s and Presto in the 2010s. While the capabilities of some products are rich and mature, the fundamental challenges of cross-source query optimisation and excessive data movement remain. Improvements in networking bandwidth and caching hardware have hardly kept up with the huge increase in ‘normal’ data volumes, and only some distributed query problems can be cleverly optimized.

In my view, virtualisation has a number of specific sweet spots: predefined, performance-tuned, operational-style applications; or ad-hoc small team experimental exploration, where speed isn’t of the essence; or support for migrations from a mess of silos into a new consolidated data platform. Mainstream self-service analytics falls between these stools, and needs a different solution.

Some implications…

Let’s assume we have reviewed the ideas underpinning data mesh, and concluded that it is relevant for our organisation. What might we want to consider in order to make it work in practice? Here are a few suggestions:

For an idea like data mesh to be successful, addressing the organisational and logical aspects in a concrete, timely way will be crucial. I remain a fan of developing a high-level conceptual business data model as a ‘Month 1’ activity of data governance, though it’s just one element. Most of the data mesh implementation examples I have seen so far skirt over this, and most appear to have relatively simple business data models, with a dozen or so key entities rather than the 50-150 I’ve seen in more complex businesses. Let’s not underestimate the up-front work.

I agree that we need to get away from ETL, where the clever stuff is done between domains, and move to approaches where data outputs from one domain are directly accessed in the next. For example, landing operational data in near-source format in the analytic platform, maybe via streaming or replication, makes it much easier for the original operational owners to take responsibility for it. To do that, the analytic platform needs to be able to accept structured and semi-structured data economically in whatever the volume, and not demand transformation or reduction en route. Once in the analytic space, access between domains ideally needs to be controlled for logical reasons alone, not needing to worry too much about the performance impact of data retrieval. The best form of copying is no copying at all. And what if we could extend that no-copying idea securely between platform instances, whether across business units or even to partners?

The best architectures aim to be as simple as possible whilst meeting the business needs. Complexity and unnecessary data movement both increase as the number of tech platforms grows. The ongoing limitations of some monolithic warehousing platforms, which have simply been ported to the cloud, have led vendors to make a virtue of cloud architectures with several different data stores, very similar to the on-prem mess. Yes, we will need different operational and analytic platforms, and maybe specialised high-speed services for real-time cases, but why not start our analytics with a single platform that supports a wide range of use cases? To allow multiple agile domain product teams to work alongside each other without contention, we need that single data platform to provide shared data access as well as dedicated space, data structures and compute resources for each team.

To democratise data product creation out to domain-oriented teams, the platform needs to be very easy to use and manage, with access to those dedicated resources not requiring technical experts, or being subject to procurement delays. An agile approach to development will require robust full-complexity-and-volume data testing. And if it’s to be truly simple and self-service, a declarative transformation paradigm based on data views can be hard to beat.

In summary…

The data mesh manifesto is a bold attempt to address a range of analytic data issues seen especially in larger, more complex organisations. While I don’t agree that its assumptions apply universally, the basic vision is sound. For organisations which identify with the problem statements, it could provide a solid logical approach to tackling those issues, justifying the work which will be needed to adapt it locally and implement it well.

At the level of technical implementation, the cross-domain consistency, coherence, performance, and security needed to meet the vision could imply a highly complex distributed tech design, which will itself become a drag on innovation.

The alternative is to use a data platform which can allow multiple diverse domains, workloads and types of data to coexist, without them compromising their needs or fighting each other over resources, where the secure sharing of data within the organisation and also to partners is simple and highly performant, and where granular business and security rules can be applied globally or locally as required. That’s a vision I can buy into, and I happen to think that I’m currently in the best place to achieve it.

Well, here’s another nice mesh you’ve gotten me into

Written by Chris Jackson