Intuit AI Research Debuts Novel Approach to Reliable Hallucination Detection in Black Box Language Models at EMNLP 2023

Jiaxin Zhang
Intuit Engineering
Published in
4 min readDec 13, 2023

This blog is co-authored by Jiaxin Zhang, staff research scientist, Kamalika Das, manager, AI Research Program and Kumar Sricharan, VP and chief architect for AI at Intuit

Of the many questions surrounding the future of artificial intelligence, hallucination detection has received a substantial amount of attention. Solving this problem is critical to improving the trustworthiness of modern language models (LMs).

Large-scale pre-trained “black box” LMs have demonstrated exceptional adaptability across a diverse array of natural language tasks that require generating open-ended responses based on user prompt comprehension. However, prominent LMs like GPT (Brown et al., 2020. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901) and PaLM (Chowdhery et al., 2022. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311) often exhibit a tendency to produce exceedingly confident, yet false, assertions commonly referred to as hallucinations. This phenomenon significantly limits their use in domains where factual accuracy is of the utmost importance.

The Intuit AI Research Program team set out to solve this problem by re-examining existing detection approaches based on the self-consistency of LMs, and uncovered two types of hallucinations resulting at both the question-level and model-level that cannot be effectively identified through self-consistency checks alone. They tackled this work in collaboration with Professor Bradley Malin and doctoral student Zhuohang Li at Vanderbilt University.

The team investigated the relationship between the self-consistency of LMs and the occurrence of hallucinations in a diverse range of tasks. Our findings indicate that while self-inconsistency in LMs often coincides with hallucination, self-consistency does not necessarily solve this problem. These findings challenge the notion that self-consistency alone can serve as a reliable indicator of veracity, as it is demonstrated that LMs can exhibit various tiers of hallucination that elude detection through self-consistency checks.

One such tier is question-level hallucination, where LMs consistently generate incorrect answers in response to specific questions. We illustrate that, by reformulating the questions, it is possible to mitigate such instances of hallucinations. Additionally, our work further reveals the existence of model-level hallucinations, whereby different LMs show discrepancies in their propensity for hallucination. Surprisingly, we even observe cases where smaller LMs are capable of correctly answering questions for which larger LMs hallucinate. Together, these findings accentuate the need to consider model-specific characteristics when assessing the occurrence of hallucinations.

With this in mind, we developed SAC3 (semantic-aware cross-check consistency), a new sampling-based approach to improve the detection of hallucinations in black-box LMs. To address question-level hallucination, we introduce a mechanism that perturbs semantically equivalent questions to evaluate the consistency of LMs’ responses across variants of the same question. By examining the generated answers to these perturbed questions, we can identify cases where the LM consistently provides incorrect responses to a specific question, which is indicative of a question-level hallucination. Furthermore, we address model-level hallucination by introducing cross-model response consistency checking, which involves comparing the responses of different LMs to the same set of questions. By identifying discrepancies between the responses of different models, we can pinpoint cases where certain models exhibit hallucinations while others provide correct answers.

By integrating these cross-checking extensions into our approach at Intuit, we can significantly improve our ability to detect hallucinations that go beyond self-consistency, thereby providing a more comprehensive assessment of the presence of question-level and model-level hallucinations. In the classification question and answer (QA) tasks, our approach achieves a high Area Under the Receiver Operating Characteristic Curve (AUROC) score — which is a performance metric for discrimination within an LM — of 99.4% and 97.0% respectively, significantly outperforming the self-consistency baseline. In the case of open-domain generation QA tasks, our approach achieves an AUROC score of 88.0% and 77.2%, substantially improving upon the self-consistency baseline by a massive margin (+13.8%, and +6.7% respectively).

Looking forward, there is much work to be done in reducing the instance of hallucinations in LMs, and we believe this research may contribute to the development of more accurate and reliable LMs by mitigating the risks of misinformation and biased outputs and promoting accountability and trust in AI systems. For a deeper dive, click here for the “SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency” paper delivered at EMNLP (Empirical Methods in Natural Language Processing) 2023 and here to view our open-sourced code available at Intuit GitHub

_________________________________________________________________

Intuit’s AI Research Program is an intrapreneurial function within the company that pushes the boundaries of AI. We develop and incubate AI-driven technology breakthroughs to solve our customers’ most important financial problems.

We’re a diverse team of research scientists, data scientists, and engineers with extensive expertise in AI, including natural language processing, generative AI, robust and explainable AI, symbolic AI, machine learning, and optimization.

To connect with us about open roles, partnerships or collaborations, contact ai-research@intuit.com

--

--