Manifold Steering Reveals the Shared Geometry of Neural Network Representation and Behavior

May 6, 20262605.05115

Daniel Wurgaft, Can Rager, Matthew Kowal, Vasudev Shyam, Sheridan Feucht + 11 more

cs.LG

TLDR

This paper shows that neural network representation geometry causally shapes behavior, enabling principled control through "manifold steering."

Key contributions

"Manifold steering" along activation geometry produces natural behavioral trajectories.
Linear steering, assuming Euclidean geometry, yields unnatural model outputs.
Optimizing interventions for desired behaviors recovers the activation manifold's curvature.
Validated across language models (reasoning, ICL) and video world models.

Why it matters

This work establishes that the geometric structure of neural representations is not incidental but causally linked to model behavior. It redefines the problem of steering neural networks from finding directions to understanding and leveraging the underlying geometry, enabling more principled and effective control.

Original Abstract

Neural representations carry rich geometric structure; but does that structure causally shape behavior? To address this question, we intervene along paths through activation space defined by different geometries, and measure the behavioral trajectories they induce. In particular, we test whether interventions that respect the geometry of activation space will yield behaviors close to those the model exhibits naturally. Concretely, we first fit an activation manifold $M_h$ to representations and a behavior manifold $M_y$ to output probability distributions. We then test the link $M_h \leftrightarrow M_y$ via interventions: we find that steering along $M_h$, which we term manifold steering, yields behavioral trajectories that follow $M_y$, while linear steering -- which assumes a Euclidean geometry -- cuts through off-manifold regions and hence produces unnatural outputs. Moreover, optimizing interventions in activation space to produce paths along $M_y$ recovers activation trajectories that trace the curvature of $M_h$. We demonstrate this bidirectional relationship between the geometry of representation and behavior across tasks and modalities. In language models, we use reasoning tasks with cyclic and sequential geometries as well as in-context learning tasks with more complex graph geometries. In a video world model, we use a task with geometry corresponding to physical dynamics. Overall, our work shows that geometry in neural representation is not merely incidental, but is in fact the proper object for enabling principled control via intervention on internals. This recasts the core problem of steering from finding the right direction to finding the right geometry.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers