Aligning Dense Retrievers with LLM Utility via DistillationAligning Dense Retrievers with LLM Utility via Distillation
Rajinder Sandhu, Di Mu, Cheng Chang, Md Shahriar Tasjid, Himanshu Rai + 2 more
TLDR
UAE aligns dense retrievers with LLM utility via distillation, achieving significant performance gains and speedup over LLM re-ranking for RAG.
Key contributions
- Introduces Utility-Aligned Embeddings (UAE) to combine dense retrieval speed with LLM re-ranking utility.
- Trains a bi-encoder using a Utility-Modulated InfoNCE objective to imitate LLM utility distribution.
- Injects graded utility signals directly into the embedding space, avoiding test-time LLM inference.
- Outperforms BGE-Base by 30%+ on QASPER and is 180x faster than LLM re-ranking methods.
Why it matters
This paper addresses a critical bottleneck in RAG by making utility-aware retrieval practical and scalable. By distilling LLM utility into dense retrievers, it offers a high-performance, efficient solution for generating reliable contexts. This approach significantly enhances RAG system effectiveness without incurring high computational costs.
Original Abstract
Dense vector retrieval is the practical backbone of Retrieval- Augmented Generation (RAG), but similarity search can suffer from precision limitations. Conversely, utility-based approaches leveraging LLM re-ranking often achieve superior performance but are computationally prohibitive and prone to noise inherent in perplexity estimation. We propose Utility-Aligned Embeddings (UAE), a framework designed to merge these advantages into a practical, high-performance retrieval method. We formulate retrieval as a distribution matching problem, training a bi-encoder to imitate a utility distribution derived from perplexity reduction using a Utility-Modulated InfoNCE objective. This approach injects graded utility signals directly into the embedding space without requiring test-time LLM inference. On the QASPER benchmark, UAE improves retrieval Recall@1 by 30.59%, MAP by 30.16% and Token F1 by 17.3% over the strong semantic baseline BGE-Base. Crucially, UAE is over 180x faster than the efficient LLM re-ranking methods preserving competitive performance, demonstrating that aligning retrieval with generative utility yields reliable contexts at scale.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.