How Analogies to Explain Accuracy Affect Human Reliance on the System

Published in

ACM CSCW

4 min readSep 20, 2023

This post written by Gaole He, summarizes a research paper, authored by Gaole He, Stefan Buijsman and Ujwal Gadiraju, from TU Delft. The paper will be presented at CSCW 2023 conference on November 17 (9–10.30 am UTC-5) in the XAI 2 Session.

Background

AI systems are increasingly being used to support human decision making. It is important that AI advice is followed appropriately. However, users typically under-rely or over-rely on AI systems, and this leads to sub-optimal team performance. Stated system accuracy, e.g., percentage to indicate how accurate the system is, is one important attribute of the AI system that substantially affects user trust and reliance behaviors.

Decision Making with accept / reject, shall I come this option? — Photo by agile42

Motivation

An unanswered question in this context pertains to why users tend to under-rely on AI systems despite their relatively high stated accuracy. Perhaps users do not properly calibrate their reliance on the AI system because they have trouble identifying the right accuracy level when presented only with an overall accuracy value. To confirm and address this issue, we propose the use of analogies to enhance the understanding of global accuracy measures. To our knowledge this is the first attempt of its kind to improve the intelligibility of system-level measures.

An analogy can be interpreted as a structural mapping of a target domain which is to be clarified (in this case, overall system accuracy) onto a source domain which the recipient of the analogy is more familiar with. As a simple example, one might elucidate how hard a task is by saying ‘it is as hard as finding a needle in a haystack’. As the recipient is likely to know that finding a needle will be difficult in this case, the inference that the relevant task will also be difficult in the target domain can be made.

Illustration of explaining accuracy with analogy

In our study, we adopted three types of analogies to elucidate the stated accuracy of an AI system:

The system is 75% accurate, which is about as reliable as the AstraZeneca vaccine is for protecting against covid
The system is 75% accurate, which is about as reliable as the five-day weather prediction
The system is 75% accurate, which is about as reliable as the French trains are on punctuality

In this work, we address the following research questions:

RQ1: How does the understanding of stated system accuracy affect reliance of users on the AI system?

RQ2: How does explaining stated system accuracy using analogies affect the reliance of users on the AI system?

What we did

To answer these questions, we proposed four hypotheses considering the effect of the stated accuracy level on user reliance, the effect of using analogies to explain accuracy measures on reliance, and two important user factors (numeracy level and familiarity with the analogy domain). We tested these hypotheses in an empirical study of human-AI collaborative decision making in a loan approval task. In this work, we present a between-subjects exploration (𝑁 = 281) as the main study to verify the proposed hypotheses. To ensure that our results do not suffer from the impact of domain-specific user characteristics (trust in and familiarity with the analogy domain) caused by individual user experiences, we conducted a further within-subjects study (𝑁 = 248) to investigate the effects of seeing different analogies.

Main Findings

We found that well-understood stated accuracy is insufficient for users to calibrate their reliance on an AI system, for a 75% accuracy level. Explaining stated system accuracy, even for users with low numeracy skills, had no significant effect on our (behavioral) reliance measure. We did find a limited effect of the successful use of analogies on subjective measures of trust in the system. However, this improvement in subjective measures did not translate to an improvement in reliance or performance. This suggests that the issue is not with users’ trust in the system, but with an overestimation of their own skill at the task.

Participants also showed mixed feedback towards the analogy used in our study. We show some examples in the Table below.

Example of participants' feedback to analogy

Takeaway Message

Based on our findings from the two studies, we reason that the under-reliance on the AI system may be a result of users’ overestimation of their own ability to solve the given task. Thus, although familiar analogies can be effective in improving the intelligibility of the stated accuracy of the system, an improved understanding of system accuracy does not necessarily lead to improved system reliance and team performance.

For more details about our findings and corresponding implications, please check our paper.