ArXiv TLDR

Task-Adaptive Embedding Refinement via Test-time LLM Guidance

🐦 Tweet
2605.12487

Ariel Gera, Shir Ashury-Tahan, Gal Bloch, Ohad Eytan, Assaf Toledo

cs.CLcs.IRcs.LG

TLDR

This paper introduces an LLM-guided query refinement method that adapts embedding models in real-time for challenging zero-shot search and classification tasks.

Key contributions

  • Proposes an LLM-guided query refinement paradigm for embedding models.
  • Refines query embeddings in real-time using LLM feedback on small document sets.
  • Achieves up to +25% relative improvement across diverse search and classification benchmarks.
  • Improves ranking quality and binary separation in the embedding space.

Why it matters

This method significantly extends the usability of embedding models to complex zero-shot tasks. It offers a practical alternative to costly LLM pipelines for corpus-scale deployment, making embeddings more versatile.

Original Abstract

We explore the effectiveness of an LLM-guided query refinement paradigm for extending the usability of embedding models to challenging zero-shot search and classification tasks. Our approach refines the embedding representation of a user query using feedback from a generative LLM on a small set of documents, enabling embeddings to adapt in real time to the target task. We conduct extensive experiments with state-of-the-art text embedding models across a diverse set of challenging search and classification benchmarks. Empirical results indicate that LLM-guided query refinement yields consistent gains across all models and datasets, with relative improvements of up to +25% in literature search, intent detection, key-point matching, and nuanced query-instruction following. The refined queries improve ranking quality and induce clearer binary separation across the corpus, enabling the embedding space to better reflect the nuanced, task-specific constraints of each ad-hoc user query. Importantly, this expands the range of practical settings in which embedding models can be effectively deployed, making them a compelling alternative when costly LLM pipelines are not viable at corpus-scale. We release our experimental code for reproducibility, at https://github.com/IBM/task-aware-embedding-refinement.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.