Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning
Junhao Shen, Teng Zhang, Xiaoyan Zhao, Hong Cheng
TLDR
SLIM dynamically manages external skills for LLM agents in RL, optimizing their active skill set for improved task performance.
Key contributions
- Introduces SLIM, a framework for dynamic skill lifecycle management in agentic RL.
- Dynamically optimizes the active external skill set alongside policy learning.
- Estimates skill contribution via leave-one-skill-out validation for retention/retirement.
- Achieves 7.1% performance improvement over baselines on ALFWorld and SearchQA.
Why it matters
This paper introduces a novel approach to managing external skills for LLM agents, addressing the limitations of static skill sets. SLIM dynamically optimizes skills, leading to significant performance gains and a more general framework for agentic RL. It shows that external skills can provide continuous value, even as policies learn.
Original Abstract
Large language model agents increasingly rely on external skills to solve complex tasks, where skills act as modular units that extend their capabilities beyond what parametric memory alone supports. Existing methods assume external skills either accumulate as persistent guidance or internalized into the policy, eventually leading to zero-skill inference. We argue this assumption is overly restrictive, since with limited parametric capacity and uneven marginal contribution across skills, the optimal active skill set is non-monotonic, task- and stage-dependent. In this work, we propose SLIM, a framework of dynamic Skill LIfecycle Management for agentic reinforcement learning (RL), which treats the active external skill set as a dynamic optimization variable jointly updated with policy learning. Specifically, SLIM estimates each active skill's marginal external contribution through leave-one-skill-out validation, then applies three lifecycle operations: retaining high-value skills, retiring skills whose contribution becomes negligible after sufficient exposure, and expanding the skill bank when persistent failures reveal missing capability coverage. Experiments show that SLIM outperforms the best baselines by an average of 7.1% points across ALFWorld and SearchQA. Results further indicate that policy learning and external skill retention are not mutually exclusive: some skills are absorbed into the policy, while others continue to provide external value, supporting SLIM as a more general paradigm for skill-based agentic RL.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.