The one metric that all data teams need to track for success

Published in

The Modern Scientist

5 min readDec 11, 2022

Unlock your data team’s full potential with the one metric that matters: the north star KPI.

Digital Art generated using OpenAI’s DALL-E2

Business is incentivized to support data teams if they are given confidence that their data team brings it! But can you communicate the value of a data team? How do people know that your data strategy is working, what evidence can you provide?

Data teams need to communicate the impact they bring to the business. Therefore, establishing a data team “north star’’ KPI is essential.

Data Strategy

Before establishing a north star, a data team should have a data strategy which is a guiding principle of how they operate and why they exist, it is a blueprint of how they will achieve success. A data strategy is not a project, a plan, or a mandate to dictate how a data team should specifically operate but instead, it’s a document that is written and peer-reviewed by a data team, data product manager, and your business stakeholders that provide an aspiration and a framework on how the team makes decisions, large or small that propel an organization forward.

What can be a north star metric for a data team? I would recommend breaking it down into some key themes like Promoting innovation, Ensuring a reliable and trustworthy data platform, Increasing the productivity of the organization, or even running a data platform in an efficient and cost-effective manner.

Promoting innovation

A good north star metric for this could be the number of machine learning models built per month, per quarter. The number of new raw data sets produced or if data sets are already in place, the number of curated datasets per business case.

ML Innovation ratio = Number of new ML models / Number of all ML models x 100

Dataset Innovation ratio = Number of new raw or curated datasets / Number of active datasets x 100

Business Value of your dataset

When you produce a dataset, there is an input, which is usually the time and effort as well as costs needed to produce a dataset and measured against the value or the impact on the outcome the dataset provides.

As an example of how to measure the input side:

Cost factor = ( Compute or Storage cost + Staff cost ) / Sum of revenue

In the example, we calculate the compute as well as storage costs of datasets as a percentage of the organization’s revenue to calculate the cost factor of the datasets. You can slice this metric by project, product, or if it is a platform service by the revenue produced by all the products and services which are relying on your datasets. You can also measure the cost of datasets that have not been accessed for a while to help you decide whether to keep the datasets or ditch them, remember that keeping the pipelines running and storing the data costs money in the long run.

Next, we calculate the value of the dataset which can fall under any of the following:

Return on investment (ROI): Can you measure the financial return generated by using the data, such as cost savings, revenue increases, or reduced expenses?

Cost savings: Do you know the direct cost savings generated by using the data, such as reduced labor costs or reduced costs for external data sources?

Time savings: Does your organization save time by using the data, such as by automating manual processes or by reducing the time spent on data-related tasks?

Improved decision-making: To what extent does the data helps organizations make better decisions, does it provide more accurate or complete information?

Increased efficiency: By using the data, did your company transform to become more efficient, such as by reducing the time and effort required to complete tasks?

So to summarize it to a north star KPI then:

Value factor = Net income contribution of dataset / (Staff cost + Compute or Storage cost) — Cost Reduction contributed by dataset x 100

In the above, each dataset produces data that adds value to a product or a service. If you were to ask the product manager or business leader in charge of the product the value of the dataset based on either ROI, Cost / Time savings, and how the dataset improves their user’s decision-making or their own.

Data freshness and reliability

The fundamental principle is that a data set should be up and running 99.9% of the time and should be made fresh within a certain amount of time required by your Service Level Objective (SLO). You can measure the number of outages (number of hours of downtime), the number of bugs discovered, and the number of data pipeline failures that impact your data platform. Whenever a dataset is late in delivering against SLO, there are impacts on the accuracy of reporting or analytics applications relying on the data set that needs to be available at a particular schedule.

Therefore, it is important to measure the reliability for example:

Dataset reliability = 1 — ( Monthly number of downtime hours / Monthly total hours )

Data freshness = Number of successful dataset delivery in a month* / Total days in a month

*Successful dataset delivery means the number of times the dataset is made available which is measured when the data pipeline is successfully run and the dataset is materialized in final storage. For example, if the dataset is expected to be available at 0800 with a +/- 5 mins allowable time, then you measure the number of times the dataset is delivered (materialized) on or before 0805 hours.

Summary

A north star KPI for your data team and your data platform highly depends on your organization’s priority and how far you are in your data journey. The lack of a north star KPI makes it hard to explain the value of the data platform or the data team’s contribution- hence it is important that a north star KPI is discussed, agreed upon, and made visible (through a dashboard or a monthly report). Whenever the north star KPI is trending in the wrong direction, actions should be taken to improve it. Good luck and hope this inspires your data team to be successful!