OBLIQ-Bench: Exposing Overlooked Bottlenecks in Modern Retrievers with Latent and Implicit Queries
Diane Tchuindjo, Devavrat Shah, Omar Khattab
TLDR
OBLIQ-Bench introduces a new benchmark for "oblique" queries, revealing that modern retrievers struggle to find documents with latent patterns, unlike LLMs.
Key contributions
- Identifies 'oblique queries' that seek documents instantiating latent patterns or implicit signals.
- Studies three mechanisms through which obliqueness may arise in search problems.
- Introduces OBLIQ-Bench, a suite of five oblique search problems over real long-tail corpora.
- Exposes an asymmetry: LLMs recognize latent relevance, but retrievers fail to surface relevant documents.
Why it matters
This paper highlights a critical gap in current retrieval systems, showing they struggle with complex, implicit queries. It provides a new benchmark to drive research into more sophisticated retrieval architectures. This is crucial for advancing efficient search beyond current saturation.
Original Abstract
Retrieval benchmarks are increasingly saturating, but we argue that efficient search is far from a solved problem. We identify a class of queries we call oblique, which seek documents that instantiate a latent pattern, like finding all tweets that express an implicit stance, chat logs that demonstrate a particular failure mode, or transcripts that match an abstract scenario. We study three mechanisms through which obliqueness may arise and introduce OBLIQ-Bench, a suite of five oblique search problems over real long-tail corpora. OBLIQ-Bench exposes an overlooked asymmetry between retrieval and verification, where reasoning LLMs reliably recognize latent relevance whenever relevant documents are surfaced, but even sophisticated retrieval pipelines fail to surface most relevant documents in the first place. We hope that OBLIQ-Bench will drive research into retrieval architectures that efficiently capture latent patterns and implicit signals in large corpora.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.