Part 1- Overview : Cloud Data Platform Design using Data Fabric or Data Mesh. Why not Both?

Dr. Shweta Shah
5 min readFeb 14, 2022

--

The Himalaya ‘breathes,’ with mountains growing and shrinking in cycles ; Credit- JASON EDWARDS, NAT GEO IMAGE COLLECTION

Part 1 — Hybrid Data Fabric and Data Mesh Framework

Background

Organizational business data analysis is primarily based on its operational and analytical data. The design of the data platform lays a foundation of Digital transformation including the organizational culture change. In my opinion an Organizational data strategy is key to keep a continuous refresh of the data platform.

In this series I will share key topics related to a design of Cloud based Data platform. I have used both traditional and modern data management, data architecture and frameworks techniques throughout the series. The foundational concepts from DMBOK to advanced frameworks like Data Fabric and Data Mesh have been referred.

The design is not conceptual and parts of them has been implemented in organizations I have worked with. Although, the design is feasible to many domains, however, it is extensively dependent on the Organizational business goals and the key questions it is addressing.

This article introduces a design of a cloud data platform using a hybrid framework of Data fabric and Data Mesh to address the key requirements described in the following section. Fig 1.1 shown below depicts an assembly line view of how data is produced, ingested, processed, stored and consumed in different data layers.

Fig 1.1 Assembly Line View of the Data Platform using Finance Domain

Overview of the Design

The design organizes the data using a Domain-Driven Design(DDD) using both centralized Data Fabric framework techniques yet provides autonomy to end user using the de-centralized Data Mesh framework. Here’s a brief description of each of the component of the design, details to follow in the next part of the series,

  • Organization of the Data at domain data landing layer is primarily where the domain based design starts.
  • Event bus collects the data and provides it to multiple consumers. Data APIs enable data exchanges.
  • In Finance Domain both batch and real time data streaming is required. Although it is open to debate that a fully event based microservice based architecture can fulfill the dual requirements. I will discuss more on this in the next parts of series.
  • A flexible yet well defined Data security model around the S3 bucket enables easy data access at all levels of consumption.
  • AWS S3 buckets are used for data storage, however, the design is not restrictive to it.
  • Operational data is collected using the data fabric framework techniques.
  • Historical structured and unstructured transactional data is collected and stored for the consumption of the analytical data models.
  • The integration of the Master and reference data layer is one of the key enablers to reduce data quality challenges across the operational and analytical data sets. A multi-domain MDM model is used to manage data instead of a traditional MDM design.
  • Analytics platform / Science layer is a product of the data platform. It uses the cleansed data to produce data models consumed by BI and other consumers. Data mesh framework principles are heavily used at this layer.
  • A fit to purpose / custom t-shirt size light control methods instead of federated data governance to manage the data ownership, lineage, data quality are used throughout the platform. Collection of metadata throughout the data platform manages the data traceability.
  • The data assets of the data platform are managed by the data life Cycle management (DLM) framework. I will explain the DLM Framework in one of the parts of the series.

Digital transformation emphasizes better customer experiences using efficient deliveries. Automated test data management is a key enabler of this transformation journey.

This designs can easily be extended to facilitate faster implementation of DataOps & ModelsOps platform which helps different business outcomes like quicker time to market and better operational efficiencies.

Key Questions & Challenges addressed

Following are a set of core questions and challenges that were prioritized for the Cloud data platform design shown in fig. 1.1.

Operational and Analytical Data: How to migrate / rebuild on-premise operational data (structured and unstructured data ), End user computing and analytical data warehouses to cloud ?

Challenge- Most of the times SQL based Relational models on cloud don’t meet the need of advanced analytics as compared to the advanced NoSQL based models. In large organization, 90% resource skillset is SQL competent.

Data Latency: How to manage Data Latency ?

Challenge: Business goals are focused on real-time recommendations but the design needs a balance between batch and real time data streaming.

Data Quality: How to integrate data quality in the cloud data platform?

Challenge: Siloed data models lead to intensive data cleaning activities at various project level that are unable to scale or re-used.

Data Ingestion & Consumption: How to manage polyglot inputs and outputs ?

Challenge: Types of Data sources and data outputs are growing but traditional systems have dedicated connectors.

Data Access : How to provide easy data access with a federated security model ?

Challenge: Most of the Data lake design is based on single level of data security model used for only production data lifecycles. While most of the data analytics is experimental the current security model makes it difficult to access / use data.

Data Governance: How to manage data, it’s ownership, it’s organization by team, project or function to enable self -server analytics?

Challenge: Traditional data models are centralized and so is the governance around it. Business teams and IT teams struggle to manage the data organization to find the right balance.

Data Lifecycle Management: How to manage end to end data creation to destruction cycle to manage operational and audit risks?

Challenge: Data is unorganized, duplicated in multiple systems.

Test Data Management: How to deploy and deliver faster ?

Challenge: Testing and Deployment of the analytical Data models rely on traditional test environments.

Sustainable IT Practices : How does the cloud platform controls the production and consumption of data using sustainability function ?

Next article, I will focus on the differences of the data fabric & Data mesh approaches.

--

--

Dr. Shweta Shah

Hands on Data Scientist, Data Management Leader. Head of Data Architecture @SunLife. Research & Design on this blog are my own, not any Organization Specific.