How To Set the Right KPIs for GenAI

Cal Al-Dhubaib
3 min readAug 29, 2024

--

line of robots standing next to a ruler

If 2023 was the year of experimentation, 2024 is about deploying generative AI in the enterprise. With organizations ramping up their investments in GenAI technologies, the next step is measuring and proving the value of these investments.

However, unlike other projects, setting the right KPIs for generative AI requires a slightly different mindset.

As this Atlassian author puts it, “Output over time is a good way to measure the impact of machines, not knowledge workers.” If we measure the value of AI in terms of the time it frees up for humans, expecting that this will naturally lead to increased productivity, we risk setting ourselves up for failure.

He goes on to say, “When we talk about productivity, we are inherently and inescapably talking about output — not outcomes. When we talk about increasing productivity, we’re really talking about increasing output.”

Instead, we should consider measuring the outcomes (as opposed to output) of using AI.

Understanding the challenges and opportunities of GenAI will be key to ensuring that enterprises not only adopt these technologies but also get meaningful and measurable benefits from them.

This article was originally published on opendatascience.com.

Why GenAI KPIs Should Mirror Workforce Evaluation

Organizations often measure AI value in terms of time saved, but that’s just scratching the surface. Using only time saved can lead to mis-matched expectations with ROI, especially when taking into consideration the cost of running GenAI solutions at scale.

The real value lies in how well AI enhances task-specific success metrics, like reducing fulfillment time in customer service or increasing engagement in marketing.

In general, your KPIs should reflect the outcomes of using AI, not the model itself. The good news: If humans have been performing a task with some consistency within your organization, you already have a great set of metrics to measure the value of the AI-enabled solution.

These metrics might include efficiency in completing tasks, output per employee, error rate, and customer satisfaction, to name a few.

In some cases, individuals may spend more time on a task but can explore a much larger set of creative possibilities or perform work at higher quality.

Where Model Benchmarks and KPIs Intersect

While GenAI benchmarks give us a general understanding of a model’s capabilities, company-specific KPIs are great for measuring performance against business goals and use cases. When combined, they become the gold standard for evaluating AI models.

Here are just a few areas where benchmarks and KPIs intersect:

  • Standardized benchmarks offer a broad assessment of an AI model’s capabilities, while KPIs provide a specific measure of how well the model performs in the organization’s unique context.
  • Benchmarks can be used for initial screening to select promising models. Once selected, these models can be evaluated continuously using KPIs to ensure ongoing performance and alignment with business objectives.
  • Benchmarks help set expectations about a model’s potential, while KPIs measure its actual real-world impact on the business.

Ultimately, businesses must learn to separate model performance from its efficacy.

All models are wrong some amount of the time. But it doesn’t matter if the model is right 99% of the time if you can’t do anything with the results or integrate it into a workflow.

Conversely, a less accurate model may be more valuable if integrated correctly into a workflow where expert humans can seamlessly intervene as necessary.

Let’s Connect

If you found this article valuable, I’d love to connect further! Feel free to subscribe to my content right here on Medium, connect with me on LinkedIn, or contact the Further team to discuss upcoming projects and opportunities.

--

--