ArXiv TLDR

Context-Aware Search and Retrieval Under Token Erasure

🐦 Tweet
2604.18424

Sara Ghasvarianjahromi, Joshua Barr, Yauhen Yakimenka, Jörg Kliewer

cs.IRcs.IT

TLDR

This paper introduces a context-aware search and retrieval model for RAG systems that improves reliability under token erasure by assigning adaptive redundancy.

Key contributions

  • Introduces a search and retrieval model for RAG-like systems robust to token erasures.
  • Provides an information-theoretic analysis of retrieval error with partially preserved queries.
  • Demonstrates that assigning adaptive redundancy to important query features enhances retrieval reliability.
  • Shows these importance-aware redundancy principles apply to modern embedding-based retrieval pipelines.

Why it matters

This research is crucial for enhancing the robustness of RAG systems, especially when queries are incomplete or corrupted. By improving retrieval reliability, it ensures more accurate and dependable information access in real-world applications.

Original Abstract

This paper introduces and analyzes a search and retrieval model for RAG-like systems under {token} erasures. We provide an information-theoretic analysis of remote document retrieval when query representations are only partially preserved. The query is represented using term-frequency-based features, and semantically adaptive redundancy is assigned according to feature importance. Retrieval is performed using TF-IDF-weighted similarity. We characterize the retrieval error probability by showing that the vector of similarity margins converges to a multivariate Gaussian distribution, yielding an explicit approximation and computable upper bounds. Numerical results support the analysis, while a separate data-driven evaluation using embedding-based retrieval on real-world data shows that the same importance-aware redundancy principles extend to modern retrieval pipelines. Overall, the results show that assigning higher redundancy to semantically important query features improves retrieval reliability.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.