Contextual Linear Activation Steering of Language Models
Brandon Hsu, Daniel Beaglehole, Adityanarayanan Radhakrishnan, Mikhail Belkin
TLDR
CLAS dynamically adjusts LLM steering strength based on context, improving performance over fixed methods and matching advanced techniques.
Key contributions
- Introduces Contextual Linear Activation Steering (CLAS) for dynamic, context-aware LLM steering.
- Dynamically adapts steering strength, addressing limitations of fixed-strength methods.
- Outperforms standard linear activation steering across 11 benchmarks and 4 model families.
- Achieves performance comparable to ReFT and LoRA with limited labeled data.
Why it matters
Existing LLM steering methods struggle with diverse inputs due to fixed steering. CLAS offers a dynamic solution, making LLM specialization more consistent and effective. This advancement provides a scalable, interpretable, and accurate way to tailor LLM behavior.
Original Abstract
Linear activation steering is a powerful approach for eliciting the capabilities of large language models and specializing their behavior using limited labeled data. While effective, existing methods often apply a fixed steering strength to all tokens, resulting in inconsistent steering quality across diverse input prompts. In this work, we introduce Contextual Linear Activation Steering (CLAS), a method that dynamically adapts linear activation steering to context-dependent steering strengths. Across eleven steering benchmarks and four model families, it consistently outperforms standard linear activation steering and matches or exceeds the performance of ReFT and LoRA in settings with limited labeled data. We therefore propose CLAS as a scalable, interpretable, and accurate method for specializing and steering large language models.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.