Conceptors for Semantic Steering
Ilias Triantafyllopoulos, Young-Min Cho, Ren Tao, Miranda Muqing Miao, Sunny Rai + 4 more
TLDR
This paper introduces conceptors, soft projection matrices for LLM semantic steering, offering a geometrically principled and compositional alternative.
Key contributions
- Introduces conceptors, soft projection matrices preserving full multidimensional concept subspaces for LLM steering.
- Proposes conceptor quota as a parameter-free diagnostic for layer selection, predicting concept separability (r=0.96).
- Develops a closed-form Boolean algebra (AND, OR, NOT) for compositional semantic steering with conceptors.
- Demonstrates conceptors match or outperform baselines, producing substantially fewer degenerate outputs.
Why it matters
This paper offers a significant advancement in controlling LLM behavior by introducing conceptors, which capture the full geometric complexity of semantic concepts. This approach provides a more robust, compositional, and safer method for steering LLMs, moving beyond simplistic single-direction methods. It enables more precise and reliable control over model outputs.
Original Abstract
Activation-based steering provides control of LLM behavior at inference time, but the dominant paradigm reduces each concept to a single direction whose geometry is left largely unexamined. Rather than selecting a single steering direction, we use conceptors: soft projection matrices estimated from activations pooled across both poles of a bipolar concept, which preserve the concept's full multidimensional subspace. A geometric analysis shows the bipolar subspace strictly subsumes the single-vector baseline. We further show that the conceptor quota provides a parameter-free layer-selection diagnostic, predicting concept separability with Pearson correlations up to r=0.96 across three instruction-tuned models and three semantic dimensions. Beyond selection, conceptors admit a closed-form Boolean algebra (AND, OR, NOT): we evaluate conceptor compositionality on thematically related sub-concepts. Across a systematic five-axis design-space evaluation, conceptors match or outperform additive baselines at layers where concept subspaces are multi-dimensional while producing substantially fewer degenerate outputs. Conceptor steering is a geometrically principled, compositional, and practically safer alternative to single-direction steering from a limited number of contrastive pairs.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.