SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

April 2, 20262604.02268

Zhengxi Lu, Zhiyuan Yao, Jinyang Wu, Chengcheng Han, Qi Gu + 5 more

cs.LG

TLDR

SKILL0 is an in-context RL framework that internalizes agent skills into LLM parameters, enabling zero-shot autonomous behavior.

Key contributions

Introduces SKILL0, an in-context RL framework for internalizing agent skills into LLM parameters.
Employs a training curriculum that progressively withdraws skill context, enabling zero-shot behavior.
Uses a Dynamic Curriculum to evaluate and retain only beneficial skills, optimizing learning efficiency.
Achieves significant performance gains (+9.7% ALFWorld, +6.6% Search-QA) with efficient context.

Why it matters

Current LLM agents struggle with runtime skill retrieval's inefficiency and lack of true skill acquisition. SKILL0 internalizes skills into model parameters, enabling LLMs to genuinely learn and apply skills autonomously. This reduces token overhead and significantly improves performance in agentic tasks.

Original Abstract

Agent skills, structured packages of procedural knowledge and executable resources that agents dynamically load at inference time, have become a reliable mechanism for augmenting LLM agents. Yet inference-time skill augmentation is fundamentally limited: retrieval noise introduces irrelevant guidance, injected skill content imposes substantial token overhead, and the model never truly acquires the knowledge it merely follows. We ask whether skills can instead be internalized into model parameters, enabling zero-shot autonomous behavior without any runtime skill retrieval. We introduce SKILL0, an in-context reinforcement learning framework designed for skill internalization. SKILL0 introduces a training-time curriculum that begins with full skill context and progressively withdraws it. Skills are grouped offline by category and rendered with interaction history into a compact visual context, teaching he model tool invocation and multi-turn task completion. A Dynamic Curriculum then evaluates each skill file's on-policy helpfulness, retaining only those from which the current policy still benefits within a linearly decaying budget, until the agent operates in a fully zero-shot setting. Extensive agentic experiments demonstrate that SKILL0 achieves substantial improvements over the standard RL baseline (+9.7\% for ALFWorld and +6.6\% for Search-QA), while maintaining a highly efficient context of fewer than 0.5k tokens per step. Our code is available at https://github.com/ZJU-REAL/SkillZero.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers