Recipe for building your first Data Product in a Data Mesh

Published in

Google Cloud - Community

6 min readFeb 22, 2022

A journey of a thousand miles begins with a single step. For a Data Mesh, this journey begins with a single Data Product.

In this article, I will outline a recipe for building your first Data Product. Like all recipes, you are welcomed to adjust it to your context and preferences.

To bring this article to life, I will also provide an example of what a Data Product architecture looks like on Google Cloud.

This article assumes that the reader is either the business sponsor for the Data Mesh, or someone who is part of the Data Platform team enabling the Data Mesh.

1. Find a business case that can be solved with Data Mesh

First, you need to find a business sponsor who has a specific business opportunity or challenge that cannot be implemented effectively on existing data infrastructure. A few examples are:

Provide new service offerings through APIs, potentially leveraging predictive analytics
Build an integrated customer database for improving customer service, marketing, and operations
Streamlining risk and regulatory reporting

The ROI for solving this challenge or opportunity needs to justify onboarding a small team of full-time engineers to build and run the solution. Ideally, this business sponsor should 1) have a track record in delivering transformative change programmes and 2) tends to be an early adopter in a technology adoption lifecycle.

2. Ensure business readiness and acquire funding

Second, you need to get buy-in from this business sponsor to own the development of this new Data Product. It is crucial that they understand the goals of a Data Mesh and their role in building this Data Product. Specifically:

The goal of the Data Mesh is to empower distributed teams to create business value from data. It aims to create a network effect by treating Data as a Product and providing distributed teams with a self-service Data Platform that has embedded governance.
The business sponsor owns the business requirements for this new Data Product as well as accountability for the end-to-end product development lifecycle.
The business sponsor will allocate funding for a cross-functional team to deliver the Data Product. This team is responsible for putting together and delivering on a product roadmap to ensure the Data Product captures value for the given business opportunity. They are also responsible for the day-2 operations as well as any ongoing enhancement and maintenance of the product.
Progress is measured in terms of business outcomes achieved, and funding for the Data Product team can be adjusted upon business review without a change control process.

3. Prove Data Platform readiness and empower people for success

Third, ensure your Data Platform has the technical capability to onboard this new Data Product. You need to ensure that any technology or infrastructure services required by the Data Product are approved for use within your organisation.

The goal is to reduce friction and lower the barrier to entry for adopting these services. As such, there should be clearly documented security guardrails, developer guides, runbooks, and architecture blueprints on how to use these services.

Crucially, the service should be provided as a self-service capability. This can take the form of an infrastructure-as-code module to deploy a cloud database, or a templated data pipeline that can be easily adopted and extended by distributed teams. The guidance needs to also cover:

production deployment process following CI/CD best-practices
how to meet platform-level security and governance guardrails
how to integrate with the platform’s day-2 operations tooling for logging, monitoring, observability, audit endpoints, etc…

As a Data Platform team, you are successful when a distributed team can deploy and manage a Data Product without needing to involve you in the process. This proof-point can only be achieved once you’ve onboarded a number of Data Product teams as customers.

At beginning stages, you may benefit from finding a customer who is willing to co-invest to build the Data Platform components they need. Your priority as a Data Platform team is to develop the capabilities for this Data Product in a way that will be reusable by other customers of the platform, while your customer’s priority is to ensure the platform is fit-for-purpose for a live Data Product application.

You may also benefit from embedding Data Engineers from the Data Platform team into the business domain team. In this engagement model, the business domain team owns the product backlog, while the Data Engineers provide the technical expertise to deliver on the backlog as part of the same team. At the end of the engagement, the Data Engineers can choose to stay in the product team or return to the Data Platform team to start repackaging what they developed for reuse by other Data Platform customers.

It is worth nothing that the Data Platform itself is an internal product, and in the words of Matthew Skelton from Teams Topologies fame, should be a “curated experience for engineers”. The Data Platform’s goals of empowerment, lowering technical barriers, and minimising collective technical debt are all in service of the communities using it, such as the Data Product teams, and therefore require constant feedback from its customers and refinement to achieve its purpose.

4. Tell your success story

Developing a Data Product is an incremental process, as such you will benefit from setting intermediate milestones for regularly reviewing progress and the business impact achieved with your sponsors. This is also your opportunity to celebrate any successes and share your success stories with the rest of the organisation in order to attract additional customers to the Data Mesh.

5. Rinse and repeat

With every Data Product, you should aim to shorten the time-to-value for the next sets of Data Products on-boarded onto the Data Platform. The value of the overall Data Mesh should be at least proportional to the combined value of your Data Products. With a larger number of Data Products, the value of the Data Mesh should grow exponentially due to the network effect allowing Data Products to build on top of each other.

As you can guess, adopting Data Mesh may require a huge shift in how you think about Data Culture and Data Operating Model. I have previously discussed cultural and organisational challenges you might encounter when adopting Data Mesh in 10 reasons you are not ready to adopt Data Mesh. The goal of this article is to balance out the message from the previous article with a key message: you can start small and evolve into the full Data Mesh, one Data Product at a time!

Example Data Product Architecture on Google Cloud

Many organisations have built successful Data Products on Google Cloud, leveraging our comprehensive suite of serverless and integrated data services such as:

BigQuery for serverless data warehousing
Dataflow for unified batch and streaming data processing
Dataproc for serverless Spark and for running open-source Data Analytics applications
Looker for embedded analytics and visualisation
Vertex AI for end-to-end MLOps
Cloud Composer for a managed Apache Airflow workflow orchestration service
Cloud Spanner for a serverless and battle-hardened relational database service

The diagram below illustrates a typical Data Products architecture on Google Cloud. The diagram was taken from Google Cloud Architecture Diagramming Tool, courtesy of @pvergadia.

In this architecture, each Google Cloud tenant has their own project(s) where they can manage the end-to-end lifecycle of their Data Products, leveraging deployment templates and guardrails provided by the Data Platform team. Most of the services are serverless so there is minimal infrastructure to manage, allowing Data Product teams to focus on generating value from their data. To ensure global interoperability and governance, Data Platform tenants can additionally benefit from the following horizontally integrated services across the Google Cloud data stack:

Analytics Hub for zero-copy data sharing with BigQuery
Dataplex for unified governance across Data Lakes and Data Warehouse environments
Cloud IAM for unified access management across all Google Cloud services
Cloud Operations Suite for integrated logging and monitoring
Cloud Asset Inventory for auditing assets across all projects and services

For more information on how to build a Data Mesh on Google Cloud, please see the new Google Cloud Data Mesh Whitepaper published as part of the Dataplex GA launch.

Hope you found the article helpful. Is it missing something? Let me know what you think by reaching out to me on Twitter at @thinh_ha or on LinkedIn.

Special thanks for my colleagues Akhilesh Singh, Anant Vikram, andRubén Fernández who helped review this article.

Recipe for building your first Data Product in a Data Mesh

Written by Thinh Ha