Whose Reasoning Is It? Distilled Reasoning Models Are Faithful to Provided Chains but Only Sparsely to Their Own

Asaria, Salomone, Gandhi·June 25, 2026LLMINTERPRETABILITY

Reasoning models write a chain-of-thought (CoT) before answering, and that trace is increasingly read as an explanation. We ask whether the answer's probability actually depends, under intervention, on the reasoning steps, and whether this differs when the reasoning is provided to the model versus generated by it. Using two distilled reasoners (DeepSeek-R1-Distill-Qwen 1.5B and 7B), we measure per-step causal CoT use with a control-differenced answer-logprob intervention, on a synthetic arithmetic task with ground-truth load-bearing steps and on the models' own generated GSM8K reasoning. When a chain is provided, the models track it tightly: corrupting a load-bearing step collapses the correct answer (by 7 to 9 nats) while corrupting an irrelevant step moves it by essentially zero, separating the two on over 99 percent of problems. When the models reason for themselves, the dependence is real but sparse: only about half of generated steps clear a load-bearing threshold, and that fraction rises through the trace, including within individual problems. The size effect is significant on the provided task (paired difference 1.45 nats, 95 percent CI [1.02, 1.88]) and suggestive on generated reasoning. We also find the verdict is method-relative (corruption and ablation disagree), and that clean activation-patching localization is blocked on synthetic arithmetic by a tension between task competence and the restated values that competence relies on.

Download PDF ↓