De-mesh-tifying Data Mesh 🕸️

Isabella
Target Reply | Insights Hub
7 min readApr 10, 2024

A simplified guide to understanding the concept and the impact of a decentralized data architecture

In a nutshell

Following a discussion on Data Mesh during a recent corporate event hosted by my colleague and me, we realized the utility of providing a more accessible explanation of this topic. This article aims to demystify Data Mesh by providing a straightforward overview of its core principles and showcasing its significance in tackling data management challenges encountered by large-scale organizations.

This article is for you if you are looking for an intuitive understanding of the Data Mesh concept and a detailed exploration of a possible implementation using Snowflake as a case study.

Data Mesh

What is a data mesh

The Data Mesh is a decentralized data architecture that categorizes data according to specific business domains, such as marketing, sales, and customer service. This approach empowers data producers with greater content understanding by granting them ownership and, hence, the possibility to establish data governance policies centered on documentation, quality, and access, effectively treating data as a product. By doing so, self-service data utilization is facilitated across the entire enterprise, effectively improving data exchange and usability within the enterprise.

While this federated approach eliminates operational bottlenecks associated with monolithic and centralized systems, it doesn’t exclude traditional storage systems like data lakes or data warehouses. Instead, it builds on top of those, combining both:

  • A technical shift that moves from a singular centralized data platform to multiple decentralized data repositories
  • An organizational change that promotes treating data as a product, clear ownership of the domains, decentralization, and federated governance

Data mesh intuition

Think of Data Mesh as a set of building blocks, with various pieces, such as plates and special elements, serving different purposes.

The traditional way of storing building blocks

Traditionally, after a play session, people might toss all the pieces into one bag, making it cumbersome to retrieve specific pieces. This is especially true when multiple people share the same set, as my brother and I did.

Instead, imagine sorting the pieces into different compartments based on their shapes, functions, or characterizing properties.

Data mesh way of storing building blocks

In a decentralized Data Mesh, each type of information is like a specific brick organized into separate containers. Just as a well-organized collection makes it easier to find the right pieces when building something, a Data Mesh structure allows easier access to information and a more efficient exchange and usability.

This decentralized approach ensures that as more bricks (or data) are added, the system remains organized and adaptable, just like constructing new creations with a well-sorted brick collection.

Data mesh principles

  • Domain ownership: the people who know the data best should be responsible for the quality, processing, governance, and lineage of the data assets.
  • Data as a product: These people should treat data as a product, ensuring the discoverability and trustworthiness of their data output with proper governance, catalogs, and documentation.
  • Self-serve architecture: These processes should occur in a self-serve infrastructure that is domain agnostic, enabling teams to maintain and scale their operations autonomously.
  • Federated governance: While allowing flexibility, federated governance processes should be in place to uphold standards of quality, security, and compliance.

Why a data mesh

Organizations have long struggled with data access, asking questions such as: How can we access data faster? How can we ensure its accuracy and fitness for the use case?

In traditional monolithic architectures, data is scattered across technical and organizational boundaries, often leading to centralized data engineering teams managing data retrieval. This results in complex, specific ETL processes that are difficult to reuse, hindering scalability to multiple sources and consumers.

A data mesh architecture addresses these challenges by:

  1. decomposing data into domains,
  2. attributing ownership to those with the most profound understanding of the data,
  3. treating data as a product, enhancing the quality for the consumer, and enabling a more seamless collaboration and easier scalability to several use cases

This results in:

  • enhanced trustworthiness and compliance
  • improved interoperability, collaboration, and scalability
  • democratization and self-service access

Use case: implementation with Snowflake

While a Data Mesh may not be the most practical initiative for small organizations with unclear ownership and minimal reusability needs, it holds significant potential for large enterprises seeking to optimize their data platforms. This section explores some of Snowflake’s critical features for managing and sharing Data Products within the data platform, effectively implementing a Data Mesh.

What is Snowflake

Snowflake is a cloud-native data platform designed to provide unparalleled flexibility and scalability, overcoming the limitations of traditional data architectures. Key features include:

  • support for a wide range of data workloads, from data engineering to data science, in one single platform
  • offered as a SaaS solution, requiring minimal maintenance as tasks like version upgrades and scaling are handled automatically
  • secure and governed access to data through encryption at rest and in transit
  • high scalability and resource elasticity through cloud services, separating computing and storage so that they can scale autonomously
  • support for all data types (including structured, semi-structured, and unstructured), making it known as the Data cloud.

Why Snowflake

There is no off-the-shelf single platform that provides a complete end-to-end solution to implement the data mesh. However, Snowflake is natively distributed and supports the data mesh concept through the following strengths:

  • Domain Ownership: The platform is cloud-agnostic and available on multiple cloud providers (GCP, AWS, Azure) and cloud regions, allowing each data domain to operate in a decentralized way with dedicated resources.
  • Data-as-a-Product: Snowflake offers a feature called Secure Data Sharing, which enables the secure sharing of data and entire applications within and outside the organization without creating data copies.
  • Self-Serve Data Platform: Offered as a SaaS, Snowflake requires near zero maintenance, facilitating the creation of a self-service architecture typical of the data mesh.
  • Federated Governance: Snowflake provides various features that simplify the application of governance rules and policies centrally (e.g., role-based access control, row-level access policies, column-level data masking, external tokenization, data lineage, tags and more).

Architecture options for distributed domains in Snowflake

Let’s explore different Snowflake topologies companies have adopted to support distributed domains. These topologies serve as general patterns, with actual implementations varying based on specific requirements and preferences:

  • Account per domain: each domain has its own Snowflake account, offering maximum isolation and enabling multi-region and multi-cloud data mesh with built-in data-sharing capabilities.
Topological patterns: Account per domain
  • Database per domain: each domain resides in separate Snowflake databases within a single account, simplifying user management and governance while allowing independent resource scaling.
Topological patterns: Database per domain
  • Schema per domain: each domain is represented by separate schemas within a single database, offering lower isolation but still enabling independent resource scaling and simplified data management.
Topological patterns: Schema per domain
  • Heterogeneous domain: Domains leverage different IT stacks, with some utilizing Snowflake and others employing alternative systems. This approach can involve cloud-based and on-premises domains. However, it often entails higher complexity to accommodate diverse environments and may contradict the goal of a common, domain-agnostic self-serve platform.
Topological patterns: heterogenous domain

Conclusion

In conclusion, transitioning to a data mesh paradigm entails treating data as a product, with teams taking ownership and understanding the data they produce, enhancing its quality, discoverability, and interoperability. This shift requires organizational changes, roles, incentives, and a shift in mindset toward product thinking. While data mesh isn’t a universal solution, focusing on these aspects is crucial for success.

With its support for various topologies and features like data sharing and governance, Snowflake can play a pivotal role in facilitating this transition. However, it’s essential to recognize that the success of a data mesh transformation relies also on organizational readiness and mindset shifts rather than on technological solutions alone.

Additional info

--

--