Can QPP Choose the Right Query Variant? Evaluating Query Variant Selection for RAG Pipelines

April 24, 20262604.22661

Negar Arabzadeh, Andrew Drozdov, Michael Bendersky, Matei Zaharia

cs.IRcs.CL

TLDR

This paper evaluates Query Performance Prediction (QPP) for selecting optimal query variants in RAG pipelines, revealing a "utility gap" between retrieval and generation.

Key contributions

Investigates QPP for intra-topic query variant selection in RAG pipelines.
Identifies a "utility gap" where retrieval-optimal variants don't always yield best generations.
Shows QPP can reliably improve end-to-end RAG quality over original queries.
Lightweight pre-retrieval QPP often matches or outperforms expensive post-retrieval methods.

Why it matters

Query reformulation in RAG is costly. This paper uses QPP to efficiently select optimal query variants, improving end-to-end RAG quality. It reveals a "utility gap" where retrieval-optimized variants don't always yield the best generations, offering practical, low-latency methods to address this.

Original Abstract

Large Language Models (LLMs) have made query reformulation ubiquitous in modern retrieval and Retrieval-Augmented Generation (RAG) pipelines, enabling the generation of multiple semantically equivalent query variants. However, executing the full pipeline for every reformulation is computationally expensive, motivating selective execution: can we identify the best query variant before incurring downstream retrieval and generation costs? We investigate Query Performance Prediction (QPP) as a mechanism for variant selection across ad-hoc retrieval and end-to-end RAG. Unlike traditional QPP, which estimates query difficulty across topics, we study intra-topic discrimination - selecting the optimal reformulation among competing variants of the same information need. Through large-scale experiments on TREC-RAG using both sparse and dense retrievers, we evaluate pre- and post-retrieval predictors under correlation- and decision-based metrics. Our results reveal a systematic divergence between retrieval and generation objectives: variants that maximize ranking metrics such as nDCG often fail to produce the best generated answers, exposing a "utility gap" between retrieval relevance and generation fidelity. Nevertheless, QPP can reliably identify variants that improve end-to-end quality over the original query. Notably, lightweight pre-retrieval predictors frequently match or outperform more expensive post-retrieval methods, offering a latency-efficient approach to robust RAG.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers