ArXiv TLDR

Contextual Linear Activation Steering of Language Models

🐦 Tweet
2604.24693

Brandon Hsu, Daniel Beaglehole, Adityanarayanan Radhakrishnan, Mikhail Belkin

cs.CL

TLDR

CLAS dynamically adjusts LLM steering strength based on context, improving performance over fixed methods and matching advanced techniques.

Key contributions

  • Introduces Contextual Linear Activation Steering (CLAS) for dynamic, context-aware LLM steering.
  • Dynamically adapts steering strength, addressing limitations of fixed-strength methods.
  • Outperforms standard linear activation steering across 11 benchmarks and 4 model families.
  • Achieves performance comparable to ReFT and LoRA with limited labeled data.

Why it matters

Existing LLM steering methods struggle with diverse inputs due to fixed steering. CLAS offers a dynamic solution, making LLM specialization more consistent and effective. This advancement provides a scalable, interpretable, and accurate way to tailor LLM behavior.

Original Abstract

Linear activation steering is a powerful approach for eliciting the capabilities of large language models and specializing their behavior using limited labeled data. While effective, existing methods often apply a fixed steering strength to all tokens, resulting in inconsistent steering quality across diverse input prompts. In this work, we introduce Contextual Linear Activation Steering (CLAS), a method that dynamically adapts linear activation steering to context-dependent steering strengths. Across eleven steering benchmarks and four model families, it consistently outperforms standard linear activation steering and matches or exceeds the performance of ReFT and LoRA in settings with limited labeled data. We therefore propose CLAS as a scalable, interpretable, and accurate method for specializing and steering large language models.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.