Datability: a key dimension for a successful data-driven product

Florian BERGAMASCO
TotalEnergies Digital Factory
6 min readFeb 29, 2024
Photo by terimakasih0 on Pixabay

Datability? what kind of concept is that again… !

A data product is a solution that primarily uses data to create value for its users. It can be an application, a dashboard, an algorithm, a data science model, etc.

To explore, design and develop a data-driven product, it’s not sufficient to master the technical and methodological aspects. The ideation and framing phases must be based on Datability, that is, the ability of a Datascience product, algorithm or model to exploit data efficiently, ethically and sustainably.

Datability is an indispensable dimension of the Data 360 framework. It allows us to ensure that, from a Data point of view, the Product can be approached in the best way, by addressing the sources, typology, quality, governance and accessibility, sustainability of the data, and questions relating to legal and ethical conditions.

To assess the Datability of a Data Product, there are many approaches to be implemented in the framing phase. In this article, I’m going to examine two fundamental principles and one essential rule for building a Data Centric and user-oriented product.

Let’s talk methodology

Photo by Skitterphoto on Pixabay

The concept of FAIR

FAIR is a method used to characterize a data flow and its use in a product.
During each data analysis, 4 dimensions are addressed:
- Is it Findable?
- Is it Accessible?
- Is it Interoperable?
- Is it Reusable?
These principles are designed to facilitate the sharing and reuse of data, and to promote the transparency and reproductibility of results. They apply to raw data as well as to processed, enriched or aggregated data.

To ensure compliance with the FAIR method, you need to:
- Apply best Data Management practices: data structuration, governance, ownership, etc.
- Describe the metadata and features that will be used to characterize, understand and use the data.
- Guarantee access to data, taking into account rights and any restrictions that may apply.
- Preserve data over time, ensuring its quality, integrity and security.
- Identify the level of risk associated with legal and intellectual property issues (GDPR, European AI Act, American Executive Order on AI, Chinese IMMGAI,…)

Source Dall-E via Bing Chat Enterprise

The concept of Data Minimalism

The method of Data Minimalism consists of collecting and processing only the data required to meet the product’s needs, in order to reduce useless dependencies that can be costly for the product. It is part of a Data Ethics approach, which involves respecting the principles of transparency, responsibility, sustainability and security in the use of data.

To comply with the Data Minimalism concept, you need to:
- Clearly define the problem to be solved.
- Identify the AI family to be implemented (Machine Learning, Optimization, Computer Vision,….), that will meet user needs.
- Identify the value to be delivered to users.
- Identify the data that is relevant and essential to solving the problem.
- Limit data collection and storage to what is strictly necessary.
- Delete or archive obsolete or useless data.

Source Dall-E via Bing Chat Enterprise

The 3U rule

The 3U rule is a mnemonic for checking that the data to be provided to the product is Useful, Usable and Used. These three criteria are complementary and interdependent, enabling us to identify the data that is essential to the development of a Data-driven product.

- Useful: Data meets a real need. It provides users with added value, whether in terms of time saving, operational efficiency, savings, quality, safety, etc.
- Usable: a level of quality and availability will be applied to the Data in order to measure the feasibility of implementing an algorithm, a Datascience model, etc.
- Used: Analyzed during all phases of development (exploration, prototyping, build) via usage indicators, we can monitor whether the data is being used and meets users’ needs. Does it generate engagement, loyalty and recommendation through BI, AI models, etc.? Does it achieve the objectives set in terms of performance, profitability, visibility, etc.?

In conclusion, the 3U rule helps to frame Datability by linking data to user needs, and to identify risks to maintainability over time, while minimizing costs.

Source Dall-E via Bing Chat Enterprise

As you can see, Datability is a concept designed to promote the success of a product, by maximizing the value generated while minimizing costs over time.

What does this mean in practice?

Photo by congerdesign on Pixabay

Let’s imagine a product that involves predicting electricity production from a wind turbine. To apply the datability principle to this use case, we need to ensure that the data used to train and evaluate predictive models is good quality, i.e. complete, consistent, as expected and free from anomalies, in order to feed machine learning algorithms to estimate how much energy the wind turbine will produce in a given time interval.

Which data to use: data minimalism.

To implement the principle of data minimalism, we need to identify the data that is essential to the prediction of electricity production, and exclude data that is not relevant. To answer this first question, we need to iterate regularly, during the exploration phase, on the Data Science approach to be implemented.

For example, essential data are those directly linked to the operation of the wind turbine, such as equipment data sensor, production history, meteorological data such as wind, wind direction, etc. We could seek to add more and more data to this list.

We could try to add more and more data, but let’s question their relevance: modeling benefits versus risks (cost, compliance, confidentiality, maintainability, etc.).

Data Minimalism Matrix

Data positioned on the matrix with a high level of risk and low value are therefore considered non-essential, and may entail confidentiality, security, cost or compliance risks, such as electricity demand, electricity price, or other meteorological data in the case of our example.

Once the previous matrix has been filled in, we can apply the FAIR principle to our use case. Let’s take a closer look at the datability of the data essential for forecasting electricity production, focusing on 2 data sources in our example. As a reminder, the FAIR (Findable, Accessible, Interoperable, Reusable) methodology will help us to frame datability:

FAIR Framework

In order to create a user-centric Data Product, it is essential to apply the 3U’s throughout the entire product lifecycle. We need to focus on the needs and expectations of the product’s users, on the real, concrete use of data and models, and on the usefulness of the results obtained for decision-making. This means clearly defining the profile, objectives and constraints of users, such as wind turbine operators, managers or regulators. It also involves determining the context, conditions and modalities for using data and models, such as frequency, granularity, reliability or reactivity of prediction.

Finally, it involves assessing the impact, value and relevance of the results provided by the models, in terms of optimizing production, reducing costs, improving safety, and thus taking into account the product’s change management potential.

As you can see, datability is the first step in developing a Data-Centric product that meets users’ needs!

--

--