ArXiv TLDR

Skill Retrieval Augmentation for Agentic AI

🐦 Tweet
2604.24594

Weihang Su, Jianming Long, Qingyao Ai, Yichen Tang, Changyue Wang + 2 more

cs.CLcs.AI

TLDR

This paper introduces Skill Retrieval Augmentation (SRA) for LLM agents, enabling dynamic skill retrieval from large corpora and presenting SRA-Bench for evaluation.

Key contributions

  • Formulates Skill Retrieval Augmentation (SRA) for LLM agents to dynamically retrieve skills from large external corpora.
  • Introduces SRA-Bench, the first benchmark for evaluating skill retrieval, incorporation, and execution in agents.
  • SRA-Bench includes 5,400 test instances and a corpus of 26,262 skills, including 636 gold skills.
  • Uncovers a critical gap: LLMs struggle to selectively incorporate skills, loading them even when unnecessary or irrelevant.

Why it matters

Current LLM agents struggle to scale with growing skill sets due to context limitations. This paper offers a new paradigm, SRA, to address this by enabling dynamic skill retrieval. It also provides a crucial benchmark, SRA-Bench, and identifies key challenges in skill incorporation, paving the way for more capable and scalable AI agents.

Original Abstract

As large language models (LLMs) evolve into agentic problem solvers, they increasingly rely on external, reusable skills to handle tasks beyond their native parametric capabilities. In existing agent systems, the dominant strategy for incorporating skills is to explicitly enumerate available skills within the context window. However, this strategy fails to scale: as skill corpora expand, context budgets are consumed rapidly, and the agent becomes markedly less accurate in identifying the right skill. To this end, this paper formulates Skill Retrieval Augmentation (SRA), a new paradigm in which agents dynamically retrieve, incorporate, and apply relevant skills from large external skill corpora on demand. To make this problem measurable, we construct a large-scale skill corpus and introduce SRA-Bench, the first benchmark for decomposed evaluation of the full SRA pipeline, covering skill retrieval, skill incorporation, and end-task execution. SRA-Bench contains 5,400 capability-intensive test instances and 636 manually constructed gold skills, which are mixed with web-collected distractor skills to form a large-scale corpus of 26,262 skills. Extensive experiments show that retrieval-based skill augmentation can substantially improve agent performance, validating the promise of the paradigm. At the same time, we uncover a fundamental gap in skill incorporation: current LLM agents tend to load skills at similar rates, regardless of whether a gold skill is retrieved or whether the task actually requires external capabilities. This shows that the bottleneck in skill augmentation lies not only in retrieval but also in the base model's ability to determine which skill to load and when external loading is actually needed. These findings position SRA as a distinct research problem and establish a foundation for the scalable augmentation of capabilities in future agent systems.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.