ArXiv TLDR

Lost in the Middle: How Language Models Use Long Contexts

🐦 Tweet
2307.03172

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua + 2 more

cs.CL

TLDR

This paper reveals that language models struggle to effectively utilize relevant information located in the middle of long input contexts, often performing best when key information is at the beginning or end.

Key contributions

  • Demonstrates significant performance degradation when relevant context is positioned in the middle of long inputs.
  • Analyzes language model behavior on multi-document QA and key-value retrieval tasks requiring long context understanding.
  • Introduces new evaluation protocols to better assess long-context usage in language models.

Why it matters

Understanding how language models process long contexts is crucial as models grow to handle increasingly large inputs. This paper highlights a critical limitation—inefficient use of middle-positioned information—that impacts model reliability and guides future research toward improving context utilization and evaluation methods for long-context language models.

Original Abstract

While recent language models have the ability to take long contexts as input, relatively little is known about how well they use longer context. We analyze the performance of language models on two tasks that require identifying relevant information in their input contexts: multi-document question answering and key-value retrieval. We find that performance can degrade significantly when changing the position of relevant information, indicating that current language models do not robustly make use of information in long input contexts. In particular, we observe that performance is often highest when relevant information occurs at the beginning or end of the input context, and significantly degrades when models must access relevant information in the middle of long contexts, even for explicitly long-context models. Our analysis provides a better understanding of how language models use their input context and provides new evaluation protocols for future long-context language models.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.