Task-Adaptive Embedding Refinement via Test-time LLM Guidance

May 12, 20262605.12487

Ariel Gera, Shir Ashury-Tahan, Gal Bloch, Ohad Eytan, Assaf Toledo

cs.CLcs.IRcs.LG

TLDR

This paper introduces an LLM-guided query refinement method that adapts embedding models in real-time for challenging zero-shot search and classification tasks.

Key contributions

Proposes an LLM-guided query refinement paradigm for embedding models.
Refines query embeddings in real-time using LLM feedback on small document sets.
Achieves up to +25% relative improvement across diverse search and classification benchmarks.
Improves ranking quality and binary separation in the embedding space.

Why it matters

This method significantly extends the usability of embedding models to complex zero-shot tasks. It offers a practical alternative to costly LLM pipelines for corpus-scale deployment, making embeddings more versatile.

Original Abstract

We explore the effectiveness of an LLM-guided query refinement paradigm for extending the usability of embedding models to challenging zero-shot search and classification tasks. Our approach refines the embedding representation of a user query using feedback from a generative LLM on a small set of documents, enabling embeddings to adapt in real time to the target task. We conduct extensive experiments with state-of-the-art text embedding models across a diverse set of challenging search and classification benchmarks. Empirical results indicate that LLM-guided query refinement yields consistent gains across all models and datasets, with relative improvements of up to +25% in literature search, intent detection, key-point matching, and nuanced query-instruction following. The refined queries improve ranking quality and induce clearer binary separation across the corpus, enabling the embedding space to better reflect the nuanced, task-specific constraints of each ad-hoc user query. Importantly, this expands the range of practical settings in which embedding models can be effectively deployed, making them a compelling alternative when costly LLM pipelines are not viable at corpus-scale. We release our experimental code for reproducibility, at https://github.com/IBM/task-aware-embedding-refinement.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers