Is Instruction-Tuning More Brain-Aligned? Mostly a Chat-Template Artifact

Asaria, Salomone, Gandhi·June 23, 2026LLMINTERPRETABILITY

Reports that instruction-tuned models are more "brain-aligned" than their base versions are mostly a chat-template artifact, not a property of alignment training. Under identical raw text, post-training weight changes leave fMRI encoding alignment essentially unchanged (Qwen base→Instruct p=0.92), while merely applying the chat template significantly raises apparent alignment — even for the base model.

Download PDF ↓