Ryan Cotterell

2 papers · Latest: May 1, 2026

Characterizing the Expressivity of Local Attention in Transformers

This paper formally explains why local attention improves Transformer quality by showing it adds expressive power, making hybrid models superior.

This paper disentangles unit definition from tokenization in surprisal theory, proposing a unified framework for consistent linguistic analysis.

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.