ArXiv TLDR

Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe

🐦 Tweet
2605.00607

Gaofei Shen, Martijn Bentum, Tom Lentz, Afra Alishahi, Grzegorz Chrupała

cs.CLeess.AS

TLDR

This paper introduces an Encoding Probe to reconstruct language model representations, offering a new way to understand feature contributions beyond decodability.

Key contributions

  • Presents an 'Encoding Probe' to reconstruct language model representations from interpretable features.
  • Solves limitations of decoding probes by enabling direct comparison of feature contributions.
  • Evaluated on text and speech transformers using acoustic, phonetic, syntactic, and speaker ID features.
  • Shows speaker effects vary by training, while syntax and lexicon contribute independently to reconstruction.

Why it matters

Understanding how language models represent information is crucial for their development. This new Encoding Probe offers a more nuanced way to analyze feature contributions, moving beyond simple decodability. It provides insights into how different features are encoded, which can guide future model design and interpretability efforts.

Original Abstract

Probing is widely used to study which features can be decoded from language model representations. However, the common decoding probe approach has two limitations that we aim to solve with our new encoding probe approach: contributions of different features to model representations cannot be directly compared, and feature correlations can affect probing results. We present an Encoding Probe that reverses this direction and reconstructs internal representations of models using interpretable features. We evaluate this method on text and speech transformer models, using feature sets spanning acoustics, phonetics, syntax, lexicon, and speaker identity. Our results suggest that speaker-related effects vary strongly across different training objectives and datasets, while syntactic and lexical features contribute independently to reconstruction. These results show that the Encoding Probe provides a complementary perspective on interpreting model representations beyond decodability.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.