← All papers
Can a Model Catch Its Own Hallucinations for Free? Label-Free Doubt Signals Hold Their Own Against a Labelled Dataset for Abstention
A model's own token-probability "doubt" signal, used to decide when to abstain, matches label-supervised abstention-tuning without using any correctness labels. Across six open-weights models (1B–8B, two families) on short-form QA, the label-free LoRA recipe shows no statistically detectable difference from the labelled one at matched coverage — the gain is calibration, not memorization, and its one blind spot is confidently-wrong facts.