Data quality + source of truth = decisions that make sense

Amira Roula
Volvo Cars Engineering
4 min readMay 17, 2022

--

Image source: mercurymediatechnology.com

Data quality is an important requirement for making well-founded data-driven decisions.

Garbage in gives garbage out

Using data just for the sake of it or because someone “told you”, rarely (if ever) generates good analyses or decisions.

In my assignment for the past a few months, I have been working with the data analytics resources of the direct-to-consumer channel of Volvo Cars with different sources of data. This is a tale of how to work together as engineers and non-engineers, to secure quality data and to use the same data sources and definitions.

Aligning Sources of Truth and Metric Definitions

To use quality data and the correct data definition is more complex than it sounds.

Let’s take a super easy concept such the volume KPI. I mean, how many different definitions of volume of a single product can there be? A thing is a thing, it probably has a unique tag even. But what constitutes the thing/product?

  • When do you count a product as a product and why?
  • Do you only count it when it is in your (the legal entity’s) possession?
  • What do you mean by possession? at a warehouse in the legal entity’s country, a warehouse abroad, in a factory being completed?

These questions relate to the definition aspect. Concepts are not easy to define but they need to be.

Even if you have a very clear and aligned definition of your data, it is important to know which data source is hosting the “truth” in case there are multiple sources of the same data. Also, it’s crucial to know about the data lineage, before using it. You should know if the data you are using is the raw version or already pre-processed, and if so, how this could have affected the raw data.

In my work with creating a new sales model and way of working with forecasting volumes, I started by thoroughly investigating:

  • Who owns the data?
  • Who uses the set of data I was working with?
  • How do the data stakeholders define the data and how is it used? How do they adapt it/use it in their work?

The user and owner list made up the stakeholder list and laid the foundation of working with one single definition of metrics and having a list of “sources of truth” for data.

Through meetings, workshops and by manager engagement, several key concepts could be defined, and sources set as the main sources of truth for the defined data.

The secret sauce for making a functional, decentralized data organization is having a strong and involving governance, which can guarantee efficiency by aligning data definitions, data formats, and best practices for working with data.

Use the same definitions and data sources

Aside from defining concepts correctly, setting the right definition, and using the correct data sources, you need to make sure it is aligned within the organization.

There are questions to be answered. For example, who owns the definition of a certain concept and relevant metrics? Is it ok for different teams or departments to use different definitions, if it is mandated or needed by the nature of their work? For example using book value figures for volume instead of using warehouse system data.

It is important to have a governance mechanism, for example by setting up workshops and meeting with key stakeholders and management to:

  • Decide who defines the key concepts, e.g., metrics and KPIs, and why: Those who set the definitions should own the responsibility to update the definitions, if needed. Of course, it is also in their responsibility to take care of alignments, which is discussed in the next point. Providing the data linage and pointing to the right source of data, to be provided by the owner as well.
  • Get buy-in from key stakeholders who might be affected: I can guarantee that the changes you decide on, e.g. definition of concepts and new data definitions, will be “forgotten” if you decide without strong alignment and documentation. For the sake of re-usability and traceability, it is important that all definitions are documented, and version controlled. To avoid future frictions, the change decisions should also be documented
  • Communicate the decisions and changes: It is important that there are direct channels between the owners and consumers of the metrics, where all stakeholders get transparent and up-to-date information about the changes. Using central data colleagues, sharing information in the governance meetings and workshops, etc., could help with this matter. This kind of change management is usually not top-of-mind or prioritized, but with the right kind of mindset and management, it will get escalated to the proper channels.

Conclusion

Setting right definitions for data, aligning on the definitions, and deciding on the right source to be used for extracting relevant metrics is a big challenge. This problem is more emphasized in large organizations, where several department and teams generate data that is used by multiple consumers. Strong and involving governance is proposed as a solution to close some of these gaps. This governance should be consisting of those who work with data and those who have relevant mandates to anchor the decisions.

--

--