Skip to main content

One post tagged with "hallucination"

View All Tags

Teaching a language model to say "I'm not sure" using its own doubt

· 8 min read

A model already has a quiet sense of when it is bluffing. We turned that sense into an off-switch for confident wrong answers, without ever telling it which answers were right.

Imagine you're a student, and the teacher asks you a math question in front of the class. As you start to answer, you have an internal feeling about how well you actually know this topic. That feeling shapes everything: how quickly you speak, how long you pause to think, whether you hedge or commit. You aren't just producing an answer; you're monitoring your own confidence as you go and adjusting based on how you feel about the thoughts forming in your mind.