Data products — Where theory meets practice

A panel discussion by Jean-Georges Perrin, Nikki Bueno De Mesquita and Wannes Rosiers

Wannes Rosiers
Data Mesh Learning
8 min readJun 10, 2024

--

Two Data Mesh Learning MVPs walk into a university auditorium and encounter a head of data products… Luckily it’s not the start of a bad joke, but the introduction to panel discussion on data products. Together with Data Science Leuven, the data mesh learning community did host an expert panel discussion on Data Products in Practice. The panel, moderated by Jules De Bruin, did consist of valued industry experts:

  • Jean-Georges Perrin of AbeaData and ex-Paypal. He is one of the driving people behind the Open Standards Data Contract with Bitol and is recognized as a Data Mesh Learning MVP due to his evangelization on data products, data contracts and data mesh as a whole.
  • Nikki Bueno de Mesquita of Otrium. As one of the first people with an official job title as head of data products, she has practical experience managing products on pricing and recommendations at Otrium.
  • Wannes Rosiers of Conveyor and ex-DPG Media. He is a seasoned data leader, who has guided multiple data mesh transitions. Quite open to share knowledge and discuss the topics of data products, data platforms and data organizations, Wannes as well has been recognized as a Data Mesh Learning MVP.

What is a data product?

First things first: if you want to talk data products, you need to make sure everyone understands the same by the term data product. Jean-Georges has two different definitions for it. When you are talking to business people, you can refer to it as: “You know what a product is? Well now it is a product offering data with all the lifecycle management linked to it.” When talking to engineers on the other hand, specifically data engineers, you can refer to it as “an assembly of data contracts”.

The main difference with a dataset is, according to Nikki, that it is designed to bring value. The better you can explain the value of a data product, the better you can sell internally the need for product thinking and the data product itself. The second characteristic is that you use it on a regular basis, because only then you will be willing to maintain it as a product.

Wannes, finally, refers to the conceptual model of a product, where a product is the intersection of something viable and desirable.

Can we make the product? Are consumers interested to buy the product?

There is of course more to this. In order to create the product you need to have both the data and the technical capabilities. To raise interest, the product must be known, desired, valuable, legally compliant and usable. As a consequence a data product is a unit of the data and everything needed to use the data.

Designing a data product

Both Nikki and Wannes pointed to the reusability of a data product. Both in moments of time — “I want to use this product on a regular basis” — as in different use-cases. Often data products are used as asset for new data products, resulting in a chain of data products.

Chain of Data Products — Image by author

As a data product must create value, if must fulfill a specific purpose. Fit-for-purpose data products however, don’t take reusability in mind. This results in a balancing exercise in your design. This topic will be discussed by Kinda El Maarry of Prima and Wannes in an upcoming webinar on the impact of product thinking for data.

Webinar announcement — Image by Conveyor

Next to this balancing act, there is another data product design trade-off to be made. You are most likely familiar with the question “can you add this one, or two fields to my report, I need it yesterday”. Those fields do not necessarily belong to the domain of the data product. At that moment, you need to make a decision on strictly guarding the bounded context of domains in your data product and building a new one, or pragmatically adding those fields to your original data product.

A third design challenge can be learned from software engineering. Just like microservices opposing monoliths, data products can be considered as a similar solution moving away from a monolithic data lake. In software engineering, you notice that going beyond micro is way too granular. The same holds for data products, and leads to a new desing decision to be made.

What about data contracts?

Remember Wannes referring to a data product as the data and everything you need to use it? And Jean-Georges stating that a data product is an assembly of data contracts? What are those data contracts?

A data contract is the prescription of everything you need to use the data. It contains elements on the datasets & schema, but also on data quality, SLAs and much more. A huge difference with data catalogs, is that for example the schema is defined up-front and that the contract is deployed in combination with the data product.

Illustration of a data contract, its principal contributors, sections, and usage — Image by Bitol

This is where Bitol comes in place: an open standard with regards to data contracts, and hence data products. Data contracts are resolving most elements addressed in FAIR principles — Findable, Accessible, Interoperable anr Reusable — or the DATSIS principles, which were introduced by Zhamak Dhegani when first describing data mesh: Discoverable, Addressable, Trustworthy, Self-Describing, Interoperable, and Secure.

Who owns these data products and data contracts?

First of all, ownership is probably one of the main reasons of existence of the concept of a data product. The need to increase maturity tightly linked to it, has led to the concept of data contracts. As a data product is a deployable, atomic unit of data and everything you need to use it, it combines both the technical and business ownership. Quite different to organizational structures with central data teams.

As ownership is combined, it allows to more easily adopt another mindset of product-thinking: you have the right to be wrong! And iterate on it, which requires data product versioning. Where datasets owned by a central team often result in the data team being a bottleneck to perform iterations, the dedicated data product team can iterate way faster.

With a data product, you have the right to be wrong.

At Otrium, this combined ownership is applied very strict. The team of Nikki is technically responsible when someone is interacting with the data. Of course she is not responsible for all the data assets she uses. Think about the example of the data product chain, they are color-coded with regards to ownership. On the other hand, she is also responsible for the monetary value: if value is not provided in 1, or 2 months, focus must be shifted from a budget perspective. When value has been proven fast though, it also allows to scale up the data product team more easily, and mature the product.

Data product thinking, and the respective ownership, often results in, or is combined with the desire to increase the amount of people working with data in an organization. This often requires the need to lower the technical barrier, introducing SQL or no-code platforms instead of scale or Python, as well as explaining Software Development LifeCycle. Both challenges can be solved with technology and processes, and are the focus of platforms like Conveyor. The hardest part remains defining the why of data products.

The incentive to own a data product

When introducing the concept of data product thinking in an organization, business teams might be enthusiastic to obtain more freedom to work with data, but might ignore as well the governance effort linked to it. For source-oriented data products, this might even be worse, as they might not see any value in their own domain.

Building and governing a data product can feel like a large burden — Photo by Pavitra Baxi on Unsplash

With regards to operational data, the asset to create a source-alligned data product, Jean-Georges and Wannes agree that even though there is an intrinsic difference in how you store the data and process it, ownership applies both to operational data and analytical data. Wannes points out that in his opinion, the owner of the operational data should be the same as the one owning the respective source-alligned data products. These people should own both the process of offering the operational data for analytical reuse, as well as including the business logic to it. It is not sufficient to change the ownership of data ingestion pipelines: dumps from operational databases require business knowledge to make that data valuable. A data product on the other hand should not depend on having such in-depth knowledge.

To get these source teams, and actually all teams, on board, there are two main things to point out. Think about use-cases in your own domain on top of your analytical data and start from your company strategy: even when you don’t see direct value in your own team or domain, you can be a crucial building block in a data product chain which brings huge value in your company.

Data products — Data value created throughout your organization

To wrap up, data products are the data and everything you need to use it. As such they combine both data and technical ownership, drifting away from teh concept of a central data team.

Every data product must be valuable, and there is a flywheel effect when data products are reused, resulting in a data product chain. To mature these interactions, we see the introduction of data contracts and standards like Bitol. Relevant metadata is captured up-front and required SLA’s and more are agreed upon.

The hardest part is defining the why of every single data product and incentivizing teams, certainly those that do not see the value within there own domain. To do so, you should point out the entire data value chain and the flywheel effect of data product reusability.

--

--

Wannes Rosiers
Data Mesh Learning

Data mesh learning MVP. Currently building Conveyor, previously data engineering manager at DPG Media. Firm believer of the value of data.