Data-Products, Black Holes and a Washing Machine
Understand how the enterprise can build a tangible definition of Data Products. How can we make the necessary mindset shift to operate data as a product?
Black holes are incredibly mysterious and alluring to the curious mind. We know they are there not because we can see them or even imagine what they are. But because we can observe their effect on the universe.
There are no black holes — in the sense of regimes from which light can’t escape to infinity. There are however apparent horizons which persist for a period of time.
Stephen Hawkins
To the average Joe like myself, Stephen’s beautiful words are somewhat bemusing. For a while, I had the same curiosity and bemusement about data-products. Lots of people are talking about them and writing about them, so they must exist. But I can’t see them or visualise them, so what are they?
This time Stephen has some more comforting words for us.
If you feel you are in a black hole. Don’t give up there’s a way out.
Stephen Hawkins
Yes Stephen, I do feel in the dark when it comes to data-products. I want to find the way out.
Data-Products
DJ Patil, former United States Chief Data Scientist, defined a data product as “a product that facilitates an end goal through the use of data”
Why didn’t my parents call me DJ? Such a cool name. I could have been a…no wait, that doesn’t work. Mr Patil offers a good succinct starting point. What else is out there?
Well, we must turn to the matriarch of the data-mesh, Zhamak Dehghani. In her numerous words of wisdom, she bestows us with six facets of a data-product. Later summarized by Sven Balnojan as DATSIS in this post.
- Discoverable
- Addressable
- Trustworthy
- Self-describing
- Interoperable
- Secure
Great, we have a principle framework for defining a data-product.
A Washing Machine
Recently our trusty steed spun its last spin. It’s been a dedicated trooper for many a year, but it was finally time to rest. So when I started seeking a replacement, I thought about that important term ‘data as an asset, aka data as a product” and how this relates to any product.
Discoverable
I needed a portal to discover the Washing Machine of my dreams. I needed search capabilities and some high-level information covering the most important properties of the product. Search and properties were the ingredients I needed to make a decision if the washing machine was right for me.
Addressable
The washing machine's dimensions, power rating and loading method told me that it would work within my existing kitchen infrastructure, style and laundry processes. Once home I could access all its features without friction.
Trustworthy
Well, well, it had a 4.5 star rating from other consumers, and it was sold by a reputable manufacturer with a five-year warranty.
This product’s going in the basket, time to checkout.
Self-describing
Once home, the packaging explained there was a washing machine inside and how to access it. The documentation shipped with the appliance explained all the details related to the product.
Interoperable
I operate the functions of the machine both through in-built controls and remotely via the home wi-fi. The quick-start guide made this an easier experience.
Secure
Can you get that door open before it’s good and ready for you to open it? Not a chance, you’ll need to wait patiently.
Now I’ve made some careful observations about the new appliance in my life, can I turn this product understanding into data-product understanding.
Digital Data-Product Contract
If we want the enterprise to understand and love data-products we can’t give them an enigma. We must give them something they can see and touch.
No more conceptual thinking or theory, let's get real. So is a data-product a file, dataset, OData, Graph or Cognitive Service API? Is it a data-warehouse star schema, BI datamart, ML model or NoSQL DB? If we recall DJ Patil’s principal statement, it can be any of these things. The first lesson we must understand is that data-products can come in many types. Just as we have many home appliances to suit all our needs.
Our enterprise data-product definition itself must be interoperable so we can weaver it into the fabric of our data governance. We must produce a meta-data artefact. A Digital Data-Product Contract.
Open API standards provide a very good blueprint for organisation information as a programmatically addressable artefact. I’ll use YAML to describe a logical Digital Data-Product Contract. I’m not advocating the use of Open API standards or YAML here, that would be a whole other post. But simply utilising them to create an example. Other methods of digitally defining a Data-Product are plausible.
Example Digital Data-Product Contract
There is a lot going on in this YAML contract. So I’ll highlight some of the important elements and how they relate to the DATSIS convention.
---
dependencies:
-
info:
contact:
email: pimteam@myorg.com
url: "https://myorg.com/pimwiki.html"
description: "Data for this product is sourced from the Product Identification Management service."
termsOfService: "https://myorg.com/pimwiki/sitereliability.html"
title: "PIM Service"
version: "2.1.0"
cadence: "Hourly"
provenanceLineage: "https://myorg.com/pimwiki/lineage.html"
-
info:
contact:
email: ordersteam@myorg.com
url: "https://myorg.com/orderswiki.html"
description: "Data for e-commerce orders is sourced from the Order Management service."
termsOfService: "https://myorg.com/orderswiki/sitereliability.html"
title: "Orders Service"
version: "1.5.0"
cadence: "Continuous"
provenanceLineage: "https://myorg.com/ordermanagementwiki/lineage.html"
info:
contact:
email: product360team@myorg.com
url: "https://myorg.com/product360wiki.html"
description: "A universal perspective of products transacted by the organisation."
termsOfService: "https://myorg.com/product360wiki/sitereliability.html"
title: "Product 360 Foundation Data Product"
version: "1.0.0"
type: "Deltalake Dataset"
location: "lakeprotocol://productdomain/product360"
gettingStartedGuide: "https://myorg.com/product360basicinterop-notebook.html"
accessrequestprocess: "https://myorg.com/product360accessrequestform.html"
schema:
title: "Product 360 Data Schema"
schemaversion: "1.0.0"
type: object
properties:
productid:
type: string
format: uuid
statefulperiod:
type: string
format: datatime
productname:
type: string
productperformancescore:
type: float
keying:
-
property: productid
description: "The productid is part of the composite key that uniquely identifies a record."
-
property: statefulperiod
description: "The productid and statefulperiod uniquely identifies a products performance score within a bounded or unbounded period."
sequencing:
-
property: statefulperiod
description: "The statefulperiod is a series of adjacent periods of time without gaps. The series spans the very first time a data-point was established until now. Each period is punctated by the eventime of any source dependency data changes."
certifcation:
informationclassification: "Commercially sensitive"
datatreaments:
-
greateexpectationstest: "Result (56/56) passed"
version: "3.1"
lastrundatetime: "15/07/22"
-
dataanomoloydetectionframework: "All data-points within expected tolerances."
version: "2.0"
lastrundatetime: "15/07/22"
dependants:
-
info:
contact:
email: appliancegrowthanddevelopmentteam@myorg.com
url: "https://myorg.com/appliancedevelopmentwiki.html"
description: "The appliance commercial growth and development team use."
title: "The Appliance Growth and Development BI Dashboard"
version: "3.1.0"
cadence: "4 hours"
provenanceLineage: "https://myorg.com/appliancedevelopmentwiki/lineage.html"
-
info:
contact:
email: tradingoptimisationteam@myorg.com
url: "https://myorg.com/tradingoptimiserwiki.html"
description: "The trading optimisation use this data-product and other data-products to optimise sales revenues."
termsOfService: "https://myorg.com/tradingoptimiserwiki/sitereliability.html"
title: "Trading Optimiser"
version: "1.5.0"
cadence: "Continuous"
provenanceLineage: "https://myorg.com/tradingoptimiserwiki/lineage.html"
Discoverable
This digital information can easily be utilised in indexing and searching technologies. The kind of tech you’ll find in the array of data governance tools on the market. The information is summarised, giving the explorer enough information to quickly form an understanding of the product. However, more detailed information is also available through hyperlinks allowing the explorer to ensure the product meets their needs.
Addressable
The contact, version, type and location information help me understand where and how the data resides.
info:
contact:
email: product360team@myorg.com
url: "https://myorg.com/product360wiki.html"
version: "1.0.0"
type: "Deltalake Dataset"
location: "lakeprotocol://productdomain/product360"
Trustworthy
The lineage, terms of service, dependencies and data treatment information generate tangible evidence that instils trust in data-product consumers.
---
dependencies:
-
info:
cadence: Hourly
contact:
email: pimteam@myorg.com
url: "https://myorg.com/pimwiki.html"
description: "Data for this product is sourced from the Product Identification Management service."
provenanceLineage: "https://myorg.com/pimwiki/lineage.html"
termsOfService: "https://myorg.com/pimwiki/sitereliability.html"
title: "PIM Service"
version: "2.1.0"
info:
certifcation:
datatreaments:
-
greateexpectationstest: "Result (56/56) passed"
lastrundatetime: 15/07/22
version: "3.1"
-
dataanomoloydetectionframework: "All data-points within expected tolerances."
lastrundatetime: 15/07/22
version: "2.0"
informationclassification: "Commercially sensitive"
contact:
email: product360team@myorg.com
url: "https://myorg.com/product360wiki.html"
description: "A universal perspective of products transacted by the organisation."
termsOfService: "https://myorg.com/product360wiki/sitereliability.html"
Self-describing
The compound effect of defining the various elements of the digital contract means that the product can be intuitively used. There should be no need for excessive data familiarisation.
Interoperable
The access request process, schema info, and getting started guides mean that consumers can interact with the data-product without friction.
info:
accessrequestprocess: "https://myorg.com/product360accessrequestform.html"
gettingStartedGuide: "https://myorg.com/product360basicinterop-notebook.html"
schema:
keying:
-
description: "The productid is part of the composite key that uniquely identifies a record."
property: productid
-
description: "The productid and statefulperiod uniquely identifies a products performance score within a bounded or unbounded period."
property: statefulperiod
properties:
productid:
format: uuid
type: string
productname:
type: string
productperformancescore:
type: float
statefulperiod:
format: datatime
type: string
schemaversion: "1.0.0"
sequencing:
-
description: "The statefulperiod is a series of adjacent periods of time without gaps. The series spans the very first time a data-point was established until now. Each period is punctated by the eventime of any source dependency data changes."
property: statefulperiod
title: "Product 360 Data Schema"
type: object
Secure
The information classification and access request process information demonstrates the product is secure and ready for responsible use. Secure not imprisoned. Defining security measures that don’t expose access control processes means the data is locked away. The organisation then fails to capitalise on the information value.
info:
informationclassification: "Commercially sensitive"
accessrequestprocess: "https://myorg.com/product360accessrequestform.html"
Closing Thoughts
It’s been a while since I wrote anything so this has been a joy to ink. If you came this far down the page I also hope you have enjoyed the content.
My main objective was to demystify the Data-Product. If I have achieved that, then awesome. If I have ignited conversation within your technical communities then that's even better.
I’d like to pay thanks to two other writers and encourage you to peruse their work. I’ve learnt a lot from their musing.
Piethein Strengholt: https://piethein.medium.com/
Eric Broda: https://medium.com/@ericbroda