Task-Adaptive Embedding Refinement via Test-time LLM Guidance
Ariel Gera, Shir Ashury-Tahan, Gal Bloch, Ohad Eytan, Assaf Toledo
TLDR
This paper introduces an LLM-guided query refinement method that adapts embedding models in real-time for challenging zero-shot search and classification tasks.
Key contributions
- Proposes an LLM-guided query refinement paradigm for embedding models.
- Refines query embeddings in real-time using LLM feedback on small document sets.
- Achieves up to +25% relative improvement across diverse search and classification benchmarks.
- Improves ranking quality and binary separation in the embedding space.
Why it matters
This method significantly extends the usability of embedding models to complex zero-shot tasks. It offers a practical alternative to costly LLM pipelines for corpus-scale deployment, making embeddings more versatile.
Original Abstract
We explore the effectiveness of an LLM-guided query refinement paradigm for extending the usability of embedding models to challenging zero-shot search and classification tasks. Our approach refines the embedding representation of a user query using feedback from a generative LLM on a small set of documents, enabling embeddings to adapt in real time to the target task. We conduct extensive experiments with state-of-the-art text embedding models across a diverse set of challenging search and classification benchmarks. Empirical results indicate that LLM-guided query refinement yields consistent gains across all models and datasets, with relative improvements of up to +25% in literature search, intent detection, key-point matching, and nuanced query-instruction following. The refined queries improve ranking quality and induce clearer binary separation across the corpus, enabling the embedding space to better reflect the nuanced, task-specific constraints of each ad-hoc user query. Importantly, this expands the range of practical settings in which embedding models can be effectively deployed, making them a compelling alternative when costly LLM pipelines are not viable at corpus-scale. We release our experimental code for reproducibility, at https://github.com/IBM/task-aware-embedding-refinement.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.