Neither parallel nor sequential: how DiffusionGemma actually commits tokens
An inference-only interpretability study of a shipped diffusion language model: we hooked its own commit mechanism and watched the order it finalizes tokens in.
Quick summary: Our mental model of a diffusion language model assumes it fills text in parallel, all at once, independent of position. We instrumented Google's DiffusionGemma, without training or changing anything, to watch the order it actually finalizes tokens in. The surprise: it matches neither picture. It is not the clean parallel fill we assume, and it is not secretly left-to-right either — it has a partial, granularity-dependent bias, commits in big batches, and behaves differently depending on what you ask it to write. Separately, we found the model leans on an internal sense of confidence to finalize tokens early and stop generations ahead of its budget, and that confidence is a reliable signal of correctness in some domains (math) but not others (factual recall).
