Data Mesh on GCP— From Operational Data to Data Product Deployment

Published in

Gnomon Digital

7 min readFeb 19, 2024

Introduction

In recent years, the data landscape has undergone a seismic transformation, driven by technological advancements, digitalisation, and the proliferation of interconnected devices. Organisations today find themselves swamped with unprecedented volumes of data.

This evolving data landscape has marked the beginning of new opportunities and challenges, necessitating a paradigm shift in how we approach data management.

Zhamak Dehghani’s influential article on Data Mesh has significantly inspired our company to embrace this innovative approach to data architecture.
This model offers a compelling alternative to the limitations of traditional centralised data architectures aligning with our organisational goals of fostering autonomy and responsiveness.

Data Management journey

This is a journey through the evolution of data management, where Data Mesh stands as a guide in the modern data landscape.

The Era of Big Data: The advent of Big Data marked a turning point, where organisations started grappling with massive datasets that traditional systems struggled to handle. From customer interactions to operational metrics, the sheer volume, velocity, and variety of data presented new possibilities for insights and innovation.

Rise of Decentralised Data Sources: As businesses embraced cloud computing, IoT devices, and distributed systems, data became decentralised. Information flowed not just within the confines of structured databases but across myriad platforms, applications, and external sources. This decentralisation promised agility but also introduced complexities in maintaining data integrity and coherence.

The Need for Real-time Insights: In the age of rapid decision-making, organisations began to demand real-time insights from their data. Traditional batch processing methods fell short, prompting the need for architectures capable of processing and analysing data on the fly.

Security and Compliance Challenges: Simultaneously, the surge in data brought forth heightened concerns about security and compliance. Protecting sensitive information became paramount, necessitating robust data governance frameworks to ensure ethical and legal use.

Enterprises as Data-Driven Entities: As organisations pivoted towards becoming data-driven entities, the imperative to derive actionable insights from data intensified. The ability to harness data for strategic decision-making, predictive analytics, and personalised customer experiences became a competitive advantage.

The Emergence of Data Mesh: In response to these challenges, the concept of Data Mesh emerged as a holistic approach to data management. Data Mesh recognises the decentralised nature of data, emphasising the importance of data domains, data products, and data services. It seeks to provide a scalable, flexible, and efficient framework for navigating the complexities of the modern data landscape.

Why do you need to implement Data Mesh ?

The conventional centralised model exemplified by data lakes has encountered scalability issues, particularly for expansive organisations like Adidas, where a singular repository managed by a central team becomes challenging to sustain.

In this type of scenarios, Data mesh emerges as the life boat, where individual data domains are owned and managed end-to-end by specific domain teams.
Emphasising a product-oriented mindset, data mesh envisions a landscape where interconnected and interoperable data products replace traditional isolated structures.

Operational and Analytical Data

In the data landscape, a significant divide exists between operational data and analytical data. Operational data powers business micro services, maintaining transactional states and serving immediate business needs. On the other hand, Analytical data offers a temporal and aggregated view, supplying insights for machine learning models or analytical reports.

The current landscape reflects this divide, leading to fragile architectures with failing ETL jobs and complex data pipelines attempting to connect these two planes. The analytical data plane itself diverges into data lakes and data warehouses, each catering to specific access patterns.

Data Mesh acknowledges the differences between these planes and aims to connect them under a unique structure based on domains, not technology stacks. Despite technology differences, it emphasises the importance of not separating organisations, teams, and people working on operational and analytical data.

While operational data technology is mature, the management and access to analytical data pose challenges at scale. Data Mesh focuses on the logical architecture and core principles to address these challenges.

Domain Ownership

Each group in a company that works with certain kinds of data is in charge of that data. It keeps things organised based on what each group does. For example if there’s a team handling podcasts and another for artists in a media company, each is responsible for their own data.

Data as a product

Data Mesh wants us to think of data like a product on a shelf. It should be easy to find, safe to use, and understandable. Each area, or ‘domain’ , in a company, should treat its data like a valuable product, making sure it’s good quality and useful.

Self-serve data platform

Imagine a tool that makes it easy for teams to handle the technical side of data without needing experts every time. Data Mesh suggests having a platform that helps teams set up and use the tools they need for their data, giving them more control.

Federated computational governance

Even though everyone manages their own data, there are some rules everyone follows. It helps smooth integration of data coming from different teams.

In a summary, Data Mesh is about giving power to the teams closest to the data, treating data like a valuable product, providing tools for easy data handling, and having agreed-upon rules while allowing flexibility.

Advantages of Data Mesh

Adopting a Data Mesh architecture offers a myriad of advantages, revolutionising how organisations approach data. Here are some key benefits :

Scalability: Helps organisations grow their data systems smoothly by sharing data responsibilities among different teams.
Flexibility: Adapts easily to business changes by using flexible, modular data services.
Improved Data Accessibility: Breaks down data barriers, letting teams work together seamlessly with their data.
Innovation: Encourages innovation by letting teams take ownership and experiment with their data products.
Cost-Efficiency: Saves costs by letting teams manage their areas independently, optimising resources as needed.
Adaptability to Diverse Data Types: Handles different types of data (structured, semi-structured, and unstructured), essential in the era of Big Data.
Decentralised Governance: Manages data rules uniquely for each area, following overall standards in a federated way.

Data Mesh stands as a progressive leap in data management, offering advantages that are particularly crucial in the fast-paced and evolving landscape of modern data ecosystems. Its ability to provide scalability, flexibility, improved data accessibility, and a conducive environment for innovation sets Data Mesh apart from traditional approaches, making it a transformative force for organisations navigating the challenges of decentralised data.

How does Data Mesh Fit in our Project

This series of articles aims to guide you through the implementation of Data Mesh, offering practical insights and strategies to leverage its principles effectively.

Teaser of the Series: Each article in this series will delve into a specific aspect of implementing Data Mesh, providing a comprehensive guide for integrating this innovative approach into your projects.

Part 1/4: Data Plane-Build ETL Pipelines with Medallion Architecture

Learn how to seamlessly pull data from diverse sources, specifically focusing on NASA’s open data about Earth’s temperature. Explore the Medallion Architecture and understand its role in structuring ETL pipelines within Data Mesh.

Part 2/4: Data Plane-Orchestrate ETL Pipelines with Airflow

Explore Airflow, where we’ll guide you in creating a Directed Acyclic Graph (DAG) and automating daily tasks for pulling data. Witness the orchestration capabilities that make Data Mesh adaptable to dynamic data workflows.

Part 3/4: Data Product-Build an API to Expose Dataset As a Product

Practice FastAPI as we guide you through creating an API to retrieve the data. Witness firsthand how Data Mesh facilitates the development of data products, emphasising the value of treating data as a product tailored to specific business needs.

Part 4/4: Data Platform-Deploy the NASA API using Terraform Template on GCP

Explore the final phase of Data Mesh implementation by deploying the NASA API using Terraform on Google Cloud Platform. Witness the seamless integration of Data Mesh with cloud services, highlighting the platform’s capability to support a decentralised and scalable architecture.

Conclusion

Recapping the journey, Data Mesh proves itself as a pivotal player in modern data architectures. Its decentralised approach, focus on domain-oriented ownership, and encouragement of a product-oriented mindset position it as a dynamic solution for organisations navigating the complexities of contemporary data landscapes.

As we conclude, we encourage you to explore each article in this series for practical implementation strategies. Whether you’re implementing ETL pipelines, orchestrating workflows, developing data products, or deploying APIs on cloud platforms, Data Mesh serves as your guide to achieving a resilient, scalable, and innovative data infrastructure.

Next chapter > Part 1/4: Data Plane-Build ETL Pipelines with Medallion Architecture