The difference between data catalogs and data product catalogs.

Wannes Rosiers
Data Mesh Learning
Published in
5 min readJul 11, 2024

Once again with the Data Mesh Learning Community, we did organize our monthly round-table on the topic Data Catalogs versus Data Product Catalogs. This session was moderated by Andrew Sharp, Amy Raygada and myself Wannes Rosiers. The difference introduced by adding the word “Product” seems rather small, but is much more profound. The discussions made that clear and are continuing in the Data Mesh Learning Slack channel #focus-data-catalogs.

Meetup announcement — Data Mesh Learning Community

The impact of “Product”

The term Data Product is emerging rapidly, yet still quite vague as well. To understand the full scope of the word Data Product, it is best to start from a product. A product is desirable, which means that someone is willing to buy it, it’s feasible, which means that you can build it (you have the technical capabilities and resources) and that it is viable. This last one means that you can build it a price that it is worth it.

A data product aims to fulfill a customers desire — Photo by Alexis Fauvet on Unsplash

In the world of data, Data Products seem to address much more this desirability: “What does my consumer want?”. A participant coming from the open data world, indicated that they use the term dataset, as someone is obliged to offer a dataset without taking the consumers needs into account. From the moment you take into account what the consumer wants, it becomes a data product. But it is much more than what the consumer wants: it’s also about “What does my consumer need to use the data?” This leads to some interesting elements being part of a Data Product:

  • Data, with some business domain logic being applied. This means you can make abstraction of raw and cleaned layers, or bronze and silver layers: it’s all about the final layer!
  • Data quality (and other SLA) measures: without trust, no one uses data, hence you need to get trust by showcasing your quality.
  • Data contracts as they are there to increase stability. As a Data Product can serve multiple purposes and have multiple output ports, a Data Product and a Data Contract is not a 1–1 mapping, but typically it does start like this.

Data Products are managed by one team, yet multiple teams can collaborate on it. And it is in this collaboration or build phase, that our round-table participants agree less. Some say the infra and compute is not part of the data product, as the consumer should not know about it, others say that it is relevant. This seems to be varying depending on whether you take a consumer or producer angle, but also other angles leed to this differentiation. In certain settings the product using the least resources, having a minimal footprint, might be advised to use.

Introducing the Data Product Catalog

Data Catalogs focus on the structure of data, data definitions and what to use the data for. While maturing to Data Products, you can consider Data Catalogs as crucial in an experimentation phase and Data Product Catalogs as the place to be when you are looking to make decisions based on data.

A Data Product Catalog will contain more high-level metadata and might link to a Data Catalog when you want to drill-down to low-level metadata. Next to this, the Data Product Catalog might contain more social elements: what is the footprint, the rating, usage, … While scanning the vendor space, it is also very clear that a Data Product Catalog differentiates itself by focussing on the whole experience: everything that enable data owners to take on their responsibilities, and makes people more productive, belongs in Data Product Catalog. Eric Broda did describe it as:

“I think the emphasis today for data catalogs is far too often the governance professional. But I think data mesh, AI, and faster development needs suggest the new focus must be the user and developer experience (which is quite different than what traditional catalogs offer)”

To continue on the consumers needs: very useful additional metadata include the definition of a domain and its bounded context, user satisfaction and ratings and other kinds of collaborative features. Or, when moving outside of the setting of a table, more towards ML models, which also can be considered as Data Products, consumers might be interested in the number of dimensions used in that model.

Moving beyond the data governance professional

Whilst again a lot of governance features are included in the data product catalog (sources and lineage of data, metadata, trust through quality, …), even more extensively than traditional data catalogs (e.g. health observability), a data product catalog should focus much more on the user and developer experience.

It’s all about user experience — Photo by Kelly Sikkema on Unsplash

This leads to a data product catalog being a bridge between the components of your modern data landscape, as this simplifies the life of a developer. From a data product catalog, you can easily move to your data catalog, data exploration tool, data development workbench, compute landingzone, and many others. But it also leads to much more functionality being incorporated in the data product catalog.

This functionality can concern additional information about usage patterns (popularity), or workflow automation. These workflows include:

  • Requesting to create a new data product
  • Requesting and approving to collaborate on a data product
  • Requesting and granting access to an output port of a data product

Note that all of these workflows should have an automated follow-up, as otherwise it does not simplify the life of people and introduces new friction points. These collaborative features are what traditional data catalogs are also growing towards, but from the governance perspective of data catalogs.

Moving towards business users?

One of the closing thoughts was that compared to data catalogs, data product catalogs are much more oriented towards business users. Yet data catalogs already did target business users: the data steward.

So where does this feeling come from? Data products group data and data access in logical combinations to fulfill a certain purpose. this higher granular level feels naturally aimed towards business users. On the other hand, the popularity ratings and promoting what is actually being used, and hence trusted, also seems to target the business users directly. Someone else did raise the question: “Is a data product catalog mainly to educate and inform business users and data consumers?”

But the answer is “No, it’s a mix.” If focusses both on the business user, as well as the producer. The distinction for the business user is that we are moving from a governance catalog to an actionable catalog, and exactly the same holds for the data producer.

--

--

Wannes Rosiers
Data Mesh Learning

Data mesh learning MVP. Currently building Conveyor, previously data engineering manager at DPG Media. Firm believer of the value of data.