Finding Truth in LLMs: UC Berkeley & Peking U Propose Unsupervised Contrast-Consistent Search

Published in

SyncedReview

3 min readDec 15, 2022

Powerful large language models (LLMs) now play essential roles in many real-world applications. But as humans become increasingly dependent on LLMs, some are questioning whether or to what extent we can trust them to deliver the “truth.”

In the new paper Discovering Latent Knowledge in Language Models Without Supervision, a research team from UC Berkeley and Peking University presents Contrast-Consistent Search (CCS), an unsupervised approach for discovering latent knowledge in language models.

The research team argues there are a number of ways conventional LLMs can become “misaligned with the truth.” If a model is trained via imitation learning, it may simply adopt the human demonstrators’ inefficiencies and errors. If a model’s outputs are rated by humans (reward optimization), the texts may be coherent and compelling, but errors that humans can’t detect may get through.

To circumvent these issues, instead of using explicit truth, the team focuses on models’ learned implicit, internal “beliefs” or “knowledge”, which it realizes through the introduction of…

Finding Truth in LLMs: UC Berkeley & Peking U Propose Unsupervised Contrast-Consistent Search

Written by Synced