Calculating ROI of an AI Concept

Measuring the business value of an idea without perfect information is venture design.

Fergie Liang

Published in

Design for Venture

5 min readApr 26, 2021

Disclaimer: Opinions expressed are solely my own and do not express the views or opinions of my employer.

Assessing the business value of early-stage ideas is a big part of my job. A product concept can be evaluated in many ways. While UX people prefer to use qualitative data (usually nicely organized in what we call “value proposition canvas”) to communicate the human-centered value of a concept, business folks prefer to see numbers.

As someone bridging the two groups, I see tremendous value in understanding how a dollar amount is “assigned” to a concept. This article is written to answer one question —

How can one quantify the business value of an idea?

I will use a sanction classifier as an example to take you through the journey of how I assign value to an AI concept.

What’s Unique about AI

At the top level, a valuable idea creates profits, meaning it’s either generating revenue or cutting costs. 95% of AI solutions create business value through saving man-hours.

However, the probabilistic nature of the AI complicates the calculation of ROI by introducing chance of errors and cost of errors.

Calculating Return

There is more than one way to calculate the return of AI, depending on what you know. Here’s a generic one that applies to most AI solutions.

Return = Value per prediction * Number of predictions — (Chance of errors * Cost of errors)* Number of predictions- Return is the value generated by an AI solution- Value per prediction is the value generated by a single prediction. This can be time saved, cost reduction, or new revenue.- The chance of errors is the probability that the model makes an error. (The definitions of errors and ways to measure them will be discussed in the next section)- The cost of errors is the additional costs incurred by a wrong prediction.Prediction is considered the unit of AI model output.

To put this formula in a realistic context, let’s look at a sanction screening AI solution, or sanction classifier, as an example.

Sanction Classifier

Regulators use a sanction classifier to check potential clients for name match on a sanctions list. Risk alerts are generated when a very close match is detected.

Let’s say correctly identifying high-risk individuals among 1000 customers takes 50 hours/3,000 mins by man and 5 mins by the AI solution. The value generated per 1000 predictions by AI is 2,995 mins or $1500 saved (Assuming the human cost is $30/hour). Value per prediction is $1.50.

But what if a high-risk individual is missed by the model? Is AI more likely to miss a target than a human? We answer this by comparing the chance of errors between the AI solution and humans.

Chance of Errors

Three metrics used to measure AI model performance are accuracy, precision, and recall.

Accuracy is the ratio of the correct predictions (true positives + true negatives) to the total samples. It emphasizes being correct in both identifying positives and negatives.Recall is the ratio of true positives to total actual positives. It emphasizes capturing targets somewhere in the model output. However, this will also increase the number of false positives.Precision is the ratio of true positives to the total predicted positives (true positives + false positives). It concerns being right among all positive observations and ignores negative observations.

Learn more about precision and recall from Google People + AI Guidebook here.

Optimize for Recall

When it comes to choosing an AI performance metric, prioritization is often needed.

The primary goal for any financial institution alike is to include all targets (true positives) in AI findings. Although this preference has caused a notoriously high false-positive rate and built up the cost of inspecting, it is a tradeoff that all financial institutions would make since false positives, or identifying wrong individuals as targets, cost way less than false negatives, or missing a target.

The secondary goal is to reduce false positives, or to reduce the cost of finding true positives while keeping the false positive rate. The sanction classifier achieves the latter by cutting the manual hours needed.

Benchmarking

We want to compare apple to apple when measuring the chance of errors. Since the organization doesn’t know how many targets and false negatives are in the customer list, we cannot use accuracy, recall, or precision. That’s when benchmarking data come in handy.

If historically, there are, on average, 2 targets (known true positives) identified among 32 suspects (total predicted positives) per 1000 individuals scanned by human associates, then the relative precision is 6.25% (= 2/32). Since there are 30 cases of known mis-classification, the relative accuracy is 97% (=970/1000). The chance of errors of human associates is 3% (30/1000).

The AI model should shoot for at least 97% relative accuracy and keeping relative precision above 6.25%.

Cost of Errors

Don’t worry, we won’t assign the colossal cost of doing business with sanctioned parties to the formula. Instead, we will focus on the relative cost. The cost of misclassifying a sanctioned party as a good customer does not differ between model and human.

However, human associates might need to do extra work to correct the error made by the model. Assume that 10 mins (0.16 hr) are needed to clean up after AI identifies a false positive. The cost per error is $4.8(=$30*0.16).

Putting it all together

Plugging the numbers in our formula, we can see that the return per 1000 predictions is $1356.

Revenue 
= Value per prediction * Number of predictions — (Chance of errors * Number of predictions)* (Cost of errors)
= $1.50 * 1000 - (3%*1000)*$4.8
= $1356

Alternatively, you can find the number of predictions to make a $1500 AI investment break-even by calculating:

$1500 = Value per prediction * Number of predictions — (Chance of errors * Number of predictions)* (Cost of errors)* Number of predictions
      = $1.50X - (0.03X)*$4.8= $1.356XX= 1106

Conclusion

We started with the idea of business value, then identify the formula of ROI for an AI concept. We found the metrics that are useful in measuring model performance, but we also hit a rock when applying these measures.

A method that is theoretically sound might not always apply in reality. Building ventures is about admitting perfection does not exist and finding ways to advance without perfect information. Hope that you enjoyed the read and learned something new.

About

This publication is dedicated to venture design. In my future posts, I will share more of my experience building venture products including tools, frameworks, and activities. Please subscribe to this publication if you are interested.