ArXiv TLDR

Conceptors for Semantic Steering

🐦 Tweet
2605.04980

Ilias Triantafyllopoulos, Young-Min Cho, Ren Tao, Miranda Muqing Miao, Sunny Rai + 4 more

cs.LGcs.CL

TLDR

This paper introduces conceptors, soft projection matrices for LLM semantic steering, offering a geometrically principled and compositional alternative.

Key contributions

  • Introduces conceptors, soft projection matrices preserving full multidimensional concept subspaces for LLM steering.
  • Proposes conceptor quota as a parameter-free diagnostic for layer selection, predicting concept separability (r=0.96).
  • Develops a closed-form Boolean algebra (AND, OR, NOT) for compositional semantic steering with conceptors.
  • Demonstrates conceptors match or outperform baselines, producing substantially fewer degenerate outputs.

Why it matters

This paper offers a significant advancement in controlling LLM behavior by introducing conceptors, which capture the full geometric complexity of semantic concepts. This approach provides a more robust, compositional, and safer method for steering LLMs, moving beyond simplistic single-direction methods. It enables more precise and reliable control over model outputs.

Original Abstract

Activation-based steering provides control of LLM behavior at inference time, but the dominant paradigm reduces each concept to a single direction whose geometry is left largely unexamined. Rather than selecting a single steering direction, we use conceptors: soft projection matrices estimated from activations pooled across both poles of a bipolar concept, which preserve the concept's full multidimensional subspace. A geometric analysis shows the bipolar subspace strictly subsumes the single-vector baseline. We further show that the conceptor quota provides a parameter-free layer-selection diagnostic, predicting concept separability with Pearson correlations up to r=0.96 across three instruction-tuned models and three semantic dimensions. Beyond selection, conceptors admit a closed-form Boolean algebra (AND, OR, NOT): we evaluate conceptor compositionality on thematically related sub-concepts. Across a systematic five-axis design-space evaluation, conceptors match or outperform additive baselines at layers where concept subspaces are multi-dimensional while producing substantially fewer degenerate outputs. Conceptor steering is a geometrically principled, compositional, and practically safer alternative to single-direction steering from a limited number of contrastive pairs.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.