Implicit Representations of Grammaticality in Language Models

May 6, 20262605.05197

Yingshan Susan Wang, Linlu Qiu, Zhaofeng Wu, Roger P. Levy, Yoon Kim

cs.CL

TLDR

Language models implicitly learn a distinct sense of grammaticality in their internal layers, separate from mere string probability, as revealed by a linear probe.

Key contributions

A linear probe on LM hidden states effectively identifies grammaticality, outperforming direct string probabilities.
The probe generalizes to human benchmarks and shows surprising cross-lingual transfer for grammaticality.
This implicit grammaticality distinction is separate from semantic plausibility and weakly correlated with string probability.

Why it matters

This research reveals that large language models develop a nuanced understanding of grammar beyond simple statistical likelihood. It opens new avenues for understanding and improving how LMs process and generate human-like language, potentially leading to more robust and grammatically sound AI systems.

Original Abstract

Grammaticality and likelihood are distinct notions in human language. Pretrained language models (LMs), which are probabilistic models of language fitted to maximize corpus likelihood, generate grammatically well-formed text and discriminate well between grammatical and ungrammatical sentences in tightly controlled minimal pairs. However, their string probabilities do not sharply discriminate between grammatical and ungrammatical sentences overall. But do LMs implicitly acquire a grammaticality distinction distinct from string probability? We explore this question through studying internal representations of LMs, by training a linear probe on a dataset of grammatical and (synthetic) ungrammatical sentences obtained by applying perturbations to a naturalistic text corpus. We find that this simple grammaticality probe generalizes to human-curated grammaticality judgment benchmarks and outperforms LM probability-based grammaticality judgments. When applied to semantic plausibility benchmarks, in which both members of a minimal pair are grammatical and differ in only plausibility, the probe however performs worse than string probability. The English-trained probe also exhibits nontrivial cross-lingual generalization, outperforming string probabilities on grammaticality benchmarks in numerous other languages. Additionally, probe scores correlate only weakly with string probabilities. These results collectively suggest that LMs acquire to some extent an implicit grammaticality distinction within their hidden layers.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers