Valuing AI — Part 2: Update your priors!

İhsancan Özpoyraz
KoçDigital
Published in
6 min readFeb 17, 2022

--

Illustration of Thomas Bayes with Bayes’ Theorem. Source: James Kulich

“By far the greatest danger of Artificial Intelligence (AI) is that people conclude too early that they understand it”, Eliezer Yudkowsky — an American AI researcher — says.

Although Yudkowsky’s caveat is a revelation of his concerns about AI singularity (a concept that has nothing to do with this blog post), I conveniently misunderstand his quote and decontextualize it as follows: people tend to make premature conclusions when it comes to AI which could mislead them about it’s real value to their organizations.

Valuing AI is not only complicated but also hardly standardizable. In this three-part blog series, I bring a few concepts up for discussion that would help establish a framework for translating AI’s value. These are:

  • Expected Value of Perfect Information
  • Bayesian Thinking
  • Value at Risk

In the first post, I presented the Expected Value of Perfect Information and illustrated a sample use of it on the Predictive Maintenance use case (Part-1). In this second piece, you will get exposed to Bayesian thinking, again with its proposed use on the same use case.

The term Bayesian derives from the 18th-century English mathematician, statistician, philosopher, and Presbyterian minister Thomas Bayes, who is known for formulating a specific case of the theorem that bears his name: Bayes’ Theorem (Bayes never published what would become his most famous accomplishment; his notes were edited and published after his death by Richard Price) (Wikipedia description).

Bayesian thinking is a framework for critical thinking as Bayes’ Theorem exhibits a notable approach for any rational and critical thinker. The scientific formula is this:

Bayes’ Theorem Formula

Once adopted Bayesian thinking helps avoid cognitive biases and viewing what’s going around through the lens of probability instead of 1s and 0s.

Although Alan Turing’s legendary solution for decoding the German Enigma during WWII is the most known application of the Bayes’ Theorem, the theory has a wide range of use cases in a wide range of disciplines in today’s modern world. Bayesian methods “are rippling through everything from physics to cancer research, ecology to psychology” (Sourced from The New York Times).

Medical testing is a popular example for demonstrating the Bayes’ formula. Indeed epidemiologists across the globe have been utilizing the concept for quite a long time. According to Marc Lipsitch, an infectious disease epidemiologist at Harvard University, Bayesian reasoning comes “awfully close to his working definition of rationality” (Sourced from his Twitter). In this blog post, I will assert how Bayesian thinking would help decision-makers to value an AI solution through associating Predictive Maintenance with rare disease testing.

First, we need to denote the problem setting, having the formula in mind. I start with the rare disease testing (provided numbers are hypothetical):

P(disease | positive test): Probability that a patient has the disease given that his/her test resulted positive. Assume this is what we seek for — how reliable the test result is?

P(positive test | disease): Probability that a patient’s test results positive given that she/he has the disease (that is the ‘true positive rate’ of the test). Assume there is a 98% chance of this.

P(disease): Probability that a patient has the disease. Assume there is a 1% chance of this.

P(no disease): Probability that a patient doesn’t have the disease. Based on the above information, there is a 99% chance of this.

P(positive test | no disease): Probability that a patient’s test results positive given that she/he doesn’t have the disease (that is the ‘false positive rate’ of the test). Assume there is a 1.1% chance of this.

In this case using the Bayes’ Theorem P(disease | positive test) equals 47%. See the calculations below:

P(disease | positive test) = A / (A+B)

A = P(positive test | disease) x P(disease) = ~0.01

B = P(no disease) x P(positive test | no disease) = ~0.011

P(disease | positive test) = 0.01 / (0.01+0.011) = ~0.47

That means if you happen to be tested positive, probably you should not worry too much as there is only a 47% chance of being really sick.

Wait for a second, this doesn’t sound right — there must be some sort of mistake! After all, test statistics give evidence that the test is almost faultless: P(positive test | disease) (true positive rate) is 98% and P(negative test | no disease) (true negative rate), i.e., 1 — P(positive test | no disease), is 98.9%. How on earth did we get a probability that is less than 50% for having the disease given that the test is positive? Bayes’ Theorem has the answer; it is because of the fact that people who have the disease are rare (only 1% of the population have the disease). Hence, despite the test being highly accurate the odds are almost equal for having and not having the disease given that the test result is positive. Pretty interesting, isn’t it?

So, how would this help us with the Predictive Maintenance use case? Decision-makers usually give green light to any predictive solution if they are sufficiently convinced with the solution’s predictive power, i.e., its reliability. However, it would be extremely misleading to take into account only the predictive model’s true positive and false positive statistics. Based on the frequency of the event that the model tries to predict, reliability wildly varies.

Here is an example that associates the Predictive Maintenance use case with the rare disease testing:

P(breakdown | alarm): Probability that a machine will break down given that the Predictive Maintenance model gives an alarm as it expects a breakdown. Assume this is what we seek for — how reliable the alarm is?

P(alarm | breakdown): Probability that the model gives an alarm given that we know the machine will break down (that is the ‘true positive rate’ of the test). Assume there is an 85% chance of this.

P(breakdown): Probability that a machine breaks down. Assume there is a 25% chance of this.

P(no breakdown): Probability that a machine is in good condition. Based on the above information, there is a 75% chance of this.

P(alarm | no breakdown): Probability that the model gives an alarm given that we know the machine is in good condition and will not break down (that is the ‘false positive rate’ of the test). Assume there is a 15% chance of this.

In this case, using the Bayes’ Theorem P(breakdown | alarm) equals 65%.

Now, assume a decision-maker who seeks a Predictive Maintenance solution that is at least 75% reliable, i.e., out of every 100 alarms at least 75 of them should truly detect a breakdown — in other saying it is sufficiently (above 75%) worthy sending a maintenance crew to the field and starting a costly maintenance operation by interrupting the continuous manufacturing process once the model gives an alarm. If this decision-maker takes the model’s true positive rate (85%) into account and gives the green light to the offered solution, since 85% > the decision criteria: 75%, she/he would be disappointed. Instead, she/he should consider P(breakdown | alarm) = 65%, which testifies that the offered solution actually underperforms based on the decision criteria.

In fact, the decision-maker should seek a solution that has, for example, 90% true positive and 5% false-positive rates. According to the machine failure frequency (25%) that this decision maker’s company has, these rates would meet what she/he expects.

On the other hand, another decision-maker whose company’s machine failure frequency is 35% would get what she/he wants from the same offer (85% true positive rate, 15% false-positive rate) as in her/his case P(breakdown | alarm) equals 75%.

It’s quite subtle but the evaluation of such a complicated technology, AI, requires some scrutinized work. Bayesian thinking certainly deserves attention while valuing an AI solution such as Predictive Maintenance as it provides a completely new perspective. Referring back to Yudkowsky; we should not hurry, instead think twice and update our priors, before making conclusions about AI.

İhsancan Özpoyraz | Senior Consultant, KoçDigital

--

--