Parsimony, Not the Clip: What Controls the Search in Reinforcement-Learning Symbolic Regression

Asaria, Salomone, Gandhi·June 23, 2026RLLLM

A mechanistic study of how an RL objective shapes symbolic-regression search: the parsimony coefficient λ cleanly and monotonically sets the operating point on the accuracy–parsimony frontier, while the DAPO clip-higher asymmetry does not — it is the entropy regularizer, not the clip, that sustains exploration. On held-out Feynman the final model recovers a modest fraction (symbolic recovery 0.205), in the range of a deep-SR baseline and below the GP incumbent PySR.

Download PDF ↓