5 Key Ingredients for Data Products

Published in

sclable

8 min readDec 21, 2022

Ingredients that are also easy to miss.

Everyone is talking about data and AI with 94% of business leaders believing that AI is essential to their success in the next five years. In addition, private investments in AI have more than doubled from 2020 to 2021. But are the current data & AI solutions able to deliver on their promises?

With 7 out of 8 data science projects never making it into production it would seem that the challenges are cropping up at earlier stages. This means that they don’t even get the chance to provide business impact. The ones that do make it into production often fail to deliver on the ambitious goals they set out to achieve, which leads us to the question: What are key factors that contribute to successful data products & services?

In this article we provide you with insights into the five essential ingredients that will help you secure this success and avoid some of the most common pitfalls. These ingredients are ones that are often overlooked and only partially connected to the data (or its processing, modeling and operations).

Note: To keep it simple, we’ll use ‘data products’ to refer to AI products & services powered by data from this point onwards.

1. Data products need to be developed as digital products

Data products have distinctive qualities but at their core they are still digital products and should be developed accordingly, which means considering usability, value, viability and feasibility throughout. A suitable method is needed to ensure that the result addresses the right problems and generates real value for customers and users.

Combine design, data & code

Based on our experience, the most value can be gained when all three perspectives of design, data and code are combined when conceptualizing and building a digital product. This means that user centricity, data centricity and technical expertise need to go hand in hand, including close collaboration on all important decisions.

Simply stated, by neglecting design or code aspects for data products, their risk of failure is drastically increased.

Ensure the product is usable, valuable, viable & feasible

When applying the collaborative method, user experience and service design create a desirable experience to ensure that users love the product. Business design and data analytics identify the value and viability of the result. Data science and software development bring the concept to a feasible, implemented and living product, ready to be used in daily business.

2. Data products need to be business relevant

The viability of a data product is perhaps the most difficult aspect to get right — and often the most important for a business to justify the financial investment. Proving viability can be centered on metrics you work on to influence how your product contributes to the overall vision and strategy for the business.

Relate to the business

Who is the respective customer/user and how does the product help customers/users to achieve their goals while still working for the business? The customer/user, the business and the data perspective need to be jointly considered to identify the relation and desired impact.

Identify how the business value can be forecasted

Leading indicators are a good tool to do so, although they are sometimes hard to define. Think of them as the measures of current activities that drive the value of your business looking forward, e.g. the number of error states of a machine per hour to predict product quality.

Leading indicator = indicator/metric that can be used to predict developments (ex-ante)
Lagging indicator = indicator/metric that measures results or outcomes (ex-post)

Judge the impact

It is important to consider quantitative (e.g. throughput time) as well as qualitative data (e.g. customer satisfaction from user interviews) to judge the impact of a product.

Both types have to be integrated as the quantitative data alone usually lacks the reason for why something works — or doesn’t.

3. Data products need to be trustworthy

The users’ trust is essential for the success of any user-facing product. This is especially important for data products, which try to reduce complexity by automation or recommendation and often work on personal data — requiring a very high level of trust. Building trustworthy products requires dedication and a holistic approach.

Use available guidelines and frameworks

Many different guidelines & frameworks for trustworthy AI have already been developed (for surveys see e.g. the collection by Thiebes et al. or ISO/IEC TR 24028:2020). They provide a good overview on which dimensions of trustworthiness exist and how to consider them for development, deployment and use in your company and use case.

Know the role of good user experience and design for AI

For the purpose of transparency and explainability, it is crucial to involve designers who are specialized in considering your users’ mental model(s). Onboarding is the first critical step in the user flow. Take an incremental approach and clearly state the users’ benefits, in particular when asking for consent on sharing data. Error cases need to be considered and the system state should be transparent at all times. Microsoft’s HAX toolkit is a great source to further explore this topic.

Consider technical robustness as a must-have

Users expect data products to perform as intended under all circumstances. This makes robustness a threshold feature according to the Kano model. User satisfaction cannot be improved via robustness, but if it is not given enough attention the risk of the product failing substantially increases. Therefore, designing the product for robustness (as well as reliability) and verifying it with appropriate testing is essential.

4. Data products need to be in production

Many data products get stuck in a Proof-of-Concept (PoC) status. Given that data products work on, well, data to provide business impact it is essential that (promising) data products go into production as soon as possible. Sure, this sounds obvious but it remains one of the most common and greatest challenges.

Focus on the most promising solutions

Since resources for creating, developing and operating data products are limited, focusing available resources on the most promising solutions is essential. It is during the PoC phase that the technical feasibility and the business impact are validated. With clear criteria (ideally in the form of set goals and metrics) it is possible to judge the feasibility and viability based on the achieved results before going into an MVP (Minimum Viable Product) stage. Development should be stopped and the involved team refocused on the more promising solution(s) if feasibility or viability cannot be proven. The aim is to use available resources as efficiently as possible.

Keep the team consistent and cross-functional

Different business and technical roles (including data scientist(s), machine learning engineer(s), software engineer(s)) are needed to achieve success with data products. The load on the different roles depends on the stage but it is highly recommended to keep the team stable from the first prototype on. Deployment experience within the team is a great plus.

Put emphasis on data architecture and technical infrastructure

Depending on the use case and its requirements (e.g. the frequency of predictions or the required latency) the data-related architecture has to be designed to include pre-processing, training, serving and deployment. The data architecture also depends and has an influence on the overall technical infrastructure, so it is best to design them together and validate their feasibility early.

Deploy early in a staging environment to test with real users

Data products are much more than mere machine learning models. They provide functionality to users (enabled by models) and this requires testing the entire product with real users in a setting close to the real-world application. Therefore testing the product in a staging environment provides the unique chance to get first-hand feedback.

5. Data products need to be continuously improved

Apart from the data powering the product, collecting high quality product analytics and (qualitative) user feedback data is of utmost importance. With this data it is possible to generate product and user insights to continuously improve the product in line with your users’ needs.

Define goals, signals & metrics

Which goals should be achieved? Which signals are needed and which metrics will be tracked as a result? The entire team benefits from transparency around set goals and applied metrics, as it provides clarity and typically boosts the team spirit. Developers need clear requirements on how to implement analytics. Well-defined metrics help provide this clarity.

Ensure high quality of analytics data

Only thorough testing can ensure that the relevant analytics data is collected correctly and that the quality of the data is satisfactory. Is the metadata also available as intended? Are edge cases covered? Are the timestamps and keys correct, especially when combining with supporting data sets?

Incorporate moment-of-truth feedback

Often it is best to implement feedback functionality directly in your data product. This ensures that the feedback is not only collected at the right moment in the process (cf. Moment-of-Truth) but also provides the chance to collect valuable supporting data on the specific situation.

Regularly ‘check-in’ with your data

A process is needed to continuously review performance, analytics & feedback data once the product is live (e.g. via weekly reviews). This generates insights and allows for an objective check if goals were achieved. Improvements based on latest insights can be derived and goals should be updated.

Let’s wrap up

Are there more ingredients? The complicated but short answer is yes, in particular around aspects of data (including the data itself, its processing, modeling and operations). In summary these five ingredients should help you avoid common traps & pitfalls for success with data & AI products:

Development as a product:
Don’t treat data products differently to a digital product.
Business relevance:
Dedicate efforts to evaluating & improving business impact early on.
Trustworthiness:
Never forget that without trust, your product won’t be used.
Active in production:
Try to launch early (if possible) to get real feedback.
Continuous improvement:
Use analytics data and listen to your users.

A big thank you to Peter Kerschhofer and Karl Holzer for their valuable contributions to this article!

This article was written for Sclable’s blog on Medium.
If you liked it, give it a 👏 and share if you ❤️.