Atomic-Probe Governance for Skill Updates in Compositional Robot Policies
Xue Qin, Simin Luan, John See, Cong Yang, Zhijun Li
TLDR
This paper introduces an atomic-quality probe and Hybrid Selector to efficiently govern skill updates in compositional robot policies, improving reliability.
Key contributions
- Introduces a paired-sampling protocol to analyze skill updates in compositional robot policies.
- Discovers a "dominant-skill effect" where one skill significantly impacts overall composition success.
- Shows off-policy behavioral metrics are ineffective at identifying dominant skills.
- Proposes an atomic-quality probe and Hybrid Selector for efficient skill-update governance.
Why it matters
Robot skill libraries are constantly updated, but existing compositional methods lack a way to manage these changes effectively. This paper introduces a principled, deployment-ready solution to govern skill updates, ensuring robust and reliable robot performance in dynamic environments.
Original Abstract
Skill libraries in deployed robotic systems are continually updated through fine-tuning, fresh demonstrations, or domain adaptation, yet existing typed-composition methods (BLADE, SymSkill, Generative Skill Chaining) treat the library as frozen at test time and do not analyze how composition outcomes change when a skill is replaced. We introduce a paired-sampling cross-version swap protocol on robosuite manipulation tasks to characterize this dimension of compositional skill learning. On a dual-arm peg-in-hole task we discover a dominant-skill effect: one ECM achieves 86.7% atomic success rate while every other ECM is at or below 26.7%, and whether this dominant ECM enters a composition shifts the success rate by up to +50pp. We characterize the boundary on a simpler pick task where all atomic policies saturate at 100% and the effect is undefined. Across three tasks we further find that off-policy behavioral distance metrics fail to identify the dominant ECM, ruling out the natural cheap predictor. We propose an atomic-quality probe and a Hybrid Selector combining per-skill probes (zero per-decision cost) with selective composition revalidation (full cost), and characterize its Pareto frontier on 144 skill-update decisions. On T6 the atomic-only probe sits 23pp below full revalidation (64.6% vs 87.5% oracle match) at zero per-decision cost; a Hybrid Selector with m=10 closes most of that gap to ~12pp at 46% of full-revalidation cost. On the cross-task average over 144 events, atomic-only is within 3pp of full revalidation under a mixed-oracle caveat. The atomic-quality probe is, to our knowledge, the first principled, deployment-ready primitive for skill-update governance in compositional robot policies.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.