Data Product Design & A UX Approach to the Future of Data

Kathleen Anderson
SAS Product Design
10 min readSep 9, 2024

--

Image generated via Microsoft Copilot

Introducing data products

Despite significant changes in the data management lifecycle over the past few decades, the goal to optimize this process for better data, insights, and business outcomes remains. There exists an abundance of literature on the topic, with new ideas emerging to meet the evolving requirements needed to see this lifecycle succeed.

One widely discussed idea on how to revolutionize the data management process is data products. While the idea of treating data as an asset has been discussed in various ways over the past decade, the term “data as a product” can be traced back to Zhamak Dehghani’s 2019 article How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh.

Dehghani identifies data as a product as a core principle in her concept of data mesh. Her overarching idea is that “domain data teams must apply product thinking with similar rigor to the datasets they provide; considering their data assets as their products and the rest of the organization’s data scientists, machine learning and data engineers as their customers.”

Product thinking, with its emphasis on solving problems and delivering value for users, would revolutionize the data management process. By viewing data as a product that needs to be managed, maintained, and optimized just as physical products are, this transforms the way users think and use data. Dehghani goes on to specify some important elements for data products:

1. Qualities to define the user experience: Dehghani defines eight characteristics for successful data products:

Discoverable: Consumers, engineers, and scientists should be able to locate a dataset easily, so that it can be more quickly understood and consumed.

Addressable: Data products should have a unique address and share common conventions so that, once located, users can programmatically access them.

Trustworthy and truthful: Each data product should establish and guarantee its intended level of integrity and accuracy through a set of Service Level Objectives (SLOs).

Self-describing semantics and syntax: Data products should be built so they are easy to use, and can be independently discovered, understood, and consumed.

Inter-operable and governed by global standards: The key to effectively correlating data across different domains lies in adhering to specific standards and harmonization rules. These standardizations should be governed globally to ensure interoperability between diverse, multilingual domain datasets.

Secure and governed by global access control: It is essential to access data products securely.

2. Cross-functional teams: Another crucial component of data products is the people who work with them, data product owners and data engineers. Dehghani imagines data product owners managing the vision, roadmap, and lifecycle of data products. They ensure consumer satisfaction by defining success criteria and KPIs (key performance indicators) and continually monitoring data for quality improvements. She stresses the importance of removing specialized silos to broaden the organization’s overall skillset. To ensure a successful, well-rounded product, different teams need to contribute their skills and perspectives — this is no different for data products. No collaboration prompts imbalanced products and unsatisfied consumers, but “cross-skill pollination” helps organizations build better, higher quality data assets.

3. Data infrastructure as a platform: To eliminate the duplicated effort and skills needed to operate the data pipelines’ (series of processes that automate the extraction, transformation, and loading (ETL) of data from various sources to a destination for analysis or storage) technology stack and infrastructure in each domain, organizations can build data infrastructure as a platform. The keys to creating this infrastructure are “(a) to not include any domain specific concepts or business logic, keeping it domain agnostic, and (b) make sure the platform hides all the underlying complexity and provides the data infrastructure components in a self-service manner.”

While data as a product may initially seem like a technical solution to be implemented on an architectural level, upon further evaluation, we see that this concept is grounded in the strategy of applying product thinking to datasets. Dehghani’s recommendations of promoting consumer satisfaction, uniting teams, and designing a self-service platform are all tasks that UX design can facilitate, decoupled from platform architecture.

So, might there be a way to implement data products into the data management process without needing to completely reconstruct the architecture underneath, by focusing on a UX-oriented solution to the problem?

What problems do data products solve?

An example of siloed ownership, the separation of technology from business departments. Image by Kathleen Anderson

To effectively implement data products as a solution, it is essential to first understand what problems they must resolve. Where do these problems originate in the broader data management system? Rather than addressing surface-level issues, the goal is always to design solutions for root causes.

In the process of identifying problems, patterns often emerge. While conducting multimethod research, one such pattern repeatedly appeared in our investigation of academic writing, articles by thought leaders in the data product space, and conversations with subject matter experts here at SAS. This problem is: siloed and hyper-specialized ownership.

Dehghani mentions siloed and hyper-specialized ownership as a shortcoming of the current enterprise data platform architecture in How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh. The issue is: when the teams who build and own data platforms are hyper-specialized and siloed from the operational units where data originates and is used, this results in engineers who lack business and domain knowledge. Dehghani contends that engineers “need to provide data for a diverse set of needs […] without a clear understanding of the application of the data and access to the consuming domain’s experts.” Operating in their own spheres with little to no communication with outside teams, organizations struggle to provide accurate and meaningful data.

Chad Sanderson and Mark Freeman similarly discuss the detriment of siloing in their book Data Contracts. When teams split into multiple silos with different strategies and work less closely to one another, this leads to a breakdown in communication causing data quality problems known as “Garbage In, Garbage Out” (GIGO). Sanderson and Freeman warn that GIGO can cost organizations in a multitude of ways, including lost revenue, operational inefficiencies, and customer dissatisfaction.

When I spoke with data management subject matter experts (SMEs) at SAS, people who talk with customers about their pain points daily, the issue of siloed work also emerged. Often, organizations begin the data management process by onboarding and analyzing large amounts of data. This makes it hard to decipher what data is important to an organization, and what is just cluttering up the pipeline. Hence, organizations suffer GIGO problems exactly like the ones Sanderson and Freeman mention.

The bottom line is data without purpose is meaningless. Data needs to serve a specific function within the context of the organization; therefore, it is important to attach business objectives to data early in the data management pipeline. This attitude is mirrored by Dehghani’s concept of data product owners, those who give data purpose by furnishing it with a vision and KPIs, as well as monitoring its journey through the pipeline.

While data quality is the surface-level problem that data products solve, the root cause of this problem runs deeper. It stems from a disconnect that Dehghani summarizes well: “Current architecture and organizational structure does not scale nor deliver the promised value of creating a data-driven organization.”

How DO you create a data-driven organization?

Siloed and hyper-specialized ownership is a large problem for many organizations, one that has little to do with the underlying architecture that data management system runs on, but rather the people who attend to the system. Engineers are isolated from other engineers, and business stakeholders are not collaborating with technical users to make sure data entering the pipeline has a purpose.

I argue that a UX-forward approach to implementing data products into the data management lifecycle presents a new perspective on a widely discussed pain point. This approach would align teams in a way that is decoupled from the platform architecture, without requiring an overhaul of the system’s foundations.

What I imagine is a solution that enables data products to act as bridges that unite people, facilitating communication and collaboration between typically separate teams, a connection that translates to meaningful, data-driven business objectives established at the beginning of the data management lifecycle.

With the correct implementation, data products can serve as the bridge between business and technical stakeholders. Image by Kathleen Anderson

My theory, based on the research my team conducted, is that if the people who attend to data along the pipeline do not work together, it doesn’t matter how much work you put into perfecting the architecture and underlying platform, there will still be problems. This is because the true value of data cannot be fully realized without people making sense of where it fits into real-world objectives.

So, what if we started by tackling the people-oriented issue of siloed and hyper-specialized ownership? Then, after resolving this, address any remaining — and likely fewer — issues?

Implementing data products with a design-first strategy

This summer my team focused on integrating data products into Information Catalog, a SAS-designed tool that helps users organize, manage, and discover data assets within their organization by providing metadata management and search capabilities. While we imagine data products to be present throughout the pipeline, I selected one specific problem to solve: the siloing of business stakeholders from technical users.

This project provides a different, human-centered approach to implementing data products. I present the following examples to demonstrate how data products can be imagined as a user experience. This makes the data management lifecycle a more accessible process, with the goal of facilitating collaboration, ensuring trustworthy data that provides meaningful insights to an organization.

The following are some important attributes of our data product concept.

A new way to define and monitor data

Early on, it became clear that data products would exist as an entirely new feature within Information Catalog. After attaching the dataset(s) to a data product, the product serves a “container” that facilitates all data-related operations.

The main aim here is to create cross-organization collaboration that inspires meaningful data collection and governing, to help businesses maximize opportunities and reach their goals.

What would this look like? We worked to create a feature where users can collaboratively define and manage data products. Carefully making design choices to ensure that both technical and non-technical stakeholders can navigate the feature easily, we aimed to ensure alignment, clear communication, and transparency, so that data can be managed and used to its fullest potential.

The ultimate job of data product pages is to allow users to find data, develop requirements for their data needs, and subsequently govern this data responsibly and effectively.

Begin with business objectives

Business objectives importantly inform other parts of the data pipeline. Image by Kathleen Anderson

It is no good for engineers to commit time and effort delivering data from one end of the pipeline to the other just to be told by the business “this isn’t what we want.” This is inefficient and a waste of time, ultimately hurting organizations.

Business stakeholders must define their objectives early in the data management lifecycle to ensure that the work the engineers do provides meaning to the organization. Drawing on Dehghani’s idea of data product owners, we wanted to design a process with a designated space for business-oriented users to think about organizational impact and specify what data they need to further their objectives, such as creating financial reports for clients or complying with audits.

We decided to integrate this process into the creation of data contracts. Typically created at the beginning of the pipeline, these contracts serve as formal agreements that map out how data is structured, formatted, and exchanged. Our reasoning here being that 1) this would successfully embed business objectives into the beginning of the pipeline and 2) clearly specifying what purpose data would serve would strengthen a contract’s importance.

Wielding the power of AI

Business objectives can be added after dataset(s) have been attached to data products, however, we also see a large benefit in being able to initiate the data journey directly from business objectives. This would minimize GIGO problems early on and ensure that the data entering the pipeline has a specified use further downstream. To do this, we would want business stakeholders to be able to initiate a data contract.

However, the process of constructing data contracts is often a technical one, requiring things like advanced query languages, data modeling, and ETL tools. How to maneuver around this technical barrier? We began to research how Generative AI could present an opportunity for nontechnical users to contribute to creating data contracts.

While I cannot currently share what this interface looks like from a user’s perspective, I can share more about what this model would include from a technical perspective:

  1. Retrieval Augmented Generation (RAG): Retrieves relevant information from business data sources based on the user’s natural language query, generating accurate and contextually relevant answers.
  2. Utilizing SLM & LLM: Combines the strengths of retrieval models and generative models, making it highly effective in scenarios where information needs to be precise.
  3. Privacy-focused: The LLM is only used to map the user’s natural language prompt to appropriate tools, bypassing compliance regulations that prohibit private data from being passed to external LLMs. The locally running SLM receives the prompt, interprets it, and generates a response that appears on the user’s screen, based on information coming from secure data centers.

This article presents a similar model to the one we envision this feature being powered on, if you want to learn more.

Conclusion

This article explains a different approach to data products and introduces a specific use case in which they can facilitate the creation of data contracts in a more accessible and collaborative way.

We are currently working towards integrating data products into other areas of the data management lifecycle to further improve how stakeholders define, manage, deliver, and govern data. For example, incorporating data products into a monitoring system to detect and mitigate bias in datasets, ensuring fair and inclusive data usage.

I believe there is power in adopting new perspectives on widely discussed topics. Thinking about data products in a UX-forward way unveils new avenues and approaches to solutions in the data product lifecycle.

I hope my project and this article can inspire discourse on how to think about data products in a different light, prioritizing user needs and human-centered design.

--

--