ArXiv TLDR

Aligning Dense Retrievers with LLM Utility via DistillationAligning Dense Retrievers with LLM Utility via Distillation

🐦 Tweet
2604.22722

Rajinder Sandhu, Di Mu, Cheng Chang, Md Shahriar Tasjid, Himanshu Rai + 2 more

cs.IRcs.AIcs.LG

TLDR

UAE aligns dense retrievers with LLM utility via distillation, achieving significant performance gains and speedup over LLM re-ranking for RAG.

Key contributions

  • Introduces Utility-Aligned Embeddings (UAE) to combine dense retrieval speed with LLM re-ranking utility.
  • Trains a bi-encoder using a Utility-Modulated InfoNCE objective to imitate LLM utility distribution.
  • Injects graded utility signals directly into the embedding space, avoiding test-time LLM inference.
  • Outperforms BGE-Base by 30%+ on QASPER and is 180x faster than LLM re-ranking methods.

Why it matters

This paper addresses a critical bottleneck in RAG by making utility-aware retrieval practical and scalable. By distilling LLM utility into dense retrievers, it offers a high-performance, efficient solution for generating reliable contexts. This approach significantly enhances RAG system effectiveness without incurring high computational costs.

Original Abstract

Dense vector retrieval is the practical backbone of Retrieval- Augmented Generation (RAG), but similarity search can suffer from precision limitations. Conversely, utility-based approaches leveraging LLM re-ranking often achieve superior performance but are computationally prohibitive and prone to noise inherent in perplexity estimation. We propose Utility-Aligned Embeddings (UAE), a framework designed to merge these advantages into a practical, high-performance retrieval method. We formulate retrieval as a distribution matching problem, training a bi-encoder to imitate a utility distribution derived from perplexity reduction using a Utility-Modulated InfoNCE objective. This approach injects graded utility signals directly into the embedding space without requiring test-time LLM inference. On the QASPER benchmark, UAE improves retrieval Recall@1 by 30.59%, MAP by 30.16% and Token F1 by 17.3% over the strong semantic baseline BGE-Base. Crucially, UAE is over 180x faster than the efficient LLM re-ranking methods preserving competitive performance, demonstrating that aligning retrieval with generative utility yields reliable contexts at scale.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.