DiffRetriever: Parallel Representative Tokens for Retrieval with Diffusion Language Models

May 8, 20262605.07210

Shuai Wang, Yin Yu, Shengyao Zhuang, Bevan Koopman, Guido Zuccon

cs.IRcs.CL

TLDR

DiffRetriever uses diffusion language models to generate multiple representative tokens in parallel, significantly improving retrieval performance over sequential autoregressive methods.

Key contributions

Introduces DiffRetriever, using diffusion LMs for parallel generation of multiple representative tokens.
Demonstrates substantial retrieval performance gains over single-token and autoregressive multi-token approaches.
Achieves state-of-the-art BEIR-7 results with supervised fine-tuning on Dream diffusion models.
Identifies adaptive token budget selection as a promising area for future work.

Why it matters

This paper overcomes the inefficiency of autoregressive models for multi-token retrieval by leveraging diffusion LMs' parallel generation capabilities. It establishes a new state-of-the-art in dense retrieval, showcasing the strong potential of diffusion models beyond text generation. This work paves the way for more efficient and powerful retrieval systems.

Original Abstract

PromptReps showed that an autoregressive language model can be used directly as a retriever by prompting it to generate dense and sparse representations of a query or passage. Extending this to multiple representatives is inefficient for autoregressive models, since tokens must be generated sequentially, and prior multi-token variants did not reliably improve over single-token decoding. We show that the bottleneck is sequential generation, not the multi-token idea itself. DiffRetriever is a representative-token retriever for diffusion language models: it appends K masked positions to the prompt and reads all K in a single bidirectional forward pass. Across in-domain and out-of-domain evaluation, multi-token DiffRetriever substantially improves over single-token on every diffusion backbone we test, while autoregressive multi-token is flat or negative and pays a latency cost that scales with K where diffusion does not. After supervised fine-tuning, DiffRetriever on Dream is the strongest BEIR-7 retriever in our comparison, ahead of PromptReps, the encoder-style DiffEmbed baseline on the same diffusion backbones, and the contrastively fine-tuned single-vector RepLLaMA. A per-query oracle on the frozen base model exceeds contrastive fine-tuning at the same fixed budget, pointing to adaptive budget selection as future work. Code is available at https://github.com/ielab/diffretriever.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers