ResRank: Unifying Retrieval and Listwise Reranking via End-to-End Joint Training with Residual Passage Compression
Xiaojie Ke, Shuai Zhang, Liansheng Sun, Yongjin Wang, Hengjun Jiang + 4 more
TLDR
ResRank unifies retrieval and reranking by compressing passages into single embeddings for efficient, effective LLM-based listwise ranking.
Key contributions
- Unifies retrieval and listwise reranking via end-to-end joint training.
- Compresses passages into single embeddings using an Encoder-LLM, reducing input length.
- Introduces a residual connection and one-step cosine similarity for efficient, effective ranking.
- Achieves competitive ranking effectiveness with zero generated tokens and minimal passage processing.
Why it matters
LLM-based reranking is powerful but slow and suffers from long input issues. ResRank solves this by efficiently compressing passages, making advanced ranking practical for industrial deployment. It offers a significant leap in balancing effectiveness and efficiency.
Original Abstract
Large language model (LLM) based listwise reranking has emerged as the dominant paradigm for achieving state-of-the-art ranking effectiveness in information retrieval. However, its reliance on feeding full passage texts into the LLM introduces two critical bottlenecks: the "lost in the middle" phenomenon degrades ranking quality as input length grows, and the inference latency scales super-linearly with sequence length, rendering it impractical for industrial deployment. In this paper, we present ResRank, a unified retrieval-reranking framework that fundamentally addresses both challenges. Inspired by multimodal LLMs that project visual inputs into compact token representations, ResRank employs an Encoder-LLM to compress each candidate passage into a single embedding, which is then fed alongside the query text into a Reranker-LLM for listwise ranking. To alleviate the misalignment between the compressed representation space and the ranking space, we introduce a residual connection structure that combines encoder embeddings with contextualized hidden states from the reranker. Furthermore, we replace the conventional autoregressive decoding with a one-step cosine-similarity-based scoring mechanism, eliminating the generation bottleneck entirely. ResRank is trained through a carefully designed dual-stage, multi-task, end-to-end joint optimization strategy that simultaneously trains the encoder and reranker, achieving learning objective alignment between retrieval and reranking while substantially reducing training complexity. Extensive experiments on TREC Deep Learning and eight BEIR benchmark datasets demonstrate that ResRank achieves competitive or superior ranking effectiveness compared to existing approaches while requiring zero generated tokens and processing only one token per passage, yielding a fundamentally better balance between effectiveness and efficiency.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.