Affordance Agent Harness: Verification-Gated Skill Orchestration
Haojian Huang, Jiahao Shi, Yinchuan Li, Yingcong Chen
TLDR
Affordance Agent Harness improves robotic affordance grounding by adaptively orchestrating skills with verification and memory for better accuracy and efficiency.
Key contributions
- Introduces Affordance Agent Harness, a closed-loop runtime for adaptive skill orchestration.
- Uses a Router to dynamically select and parameterize skills based on task difficulty.
- Incorporates an evidence store and episodic memory for efficient reuse of experience.
- Employs a Verifier to gate commitments, ensuring reliability and triggering targeted retries.
Why it matters
Existing affordance systems struggle with dynamic scenes and error recovery. This paper introduces a system that adaptively orchestrates skills with verification, improving grounding quality and efficiency. It's a key step towards more robust and reliable open-world robotic manipulation.
Original Abstract
Affordance grounding requires identifying where and how an agent should interact in open-world scenes, where actionable regions are often small, occluded, reflective, and visually ambiguous. Recent systems therefore combine multiple skills (e.g., detection, segmentation, interaction-imagination), yet most orchestrate them with fixed pipelines that are poorly matched to per-instance difficulty, offer limited targeted recovery from intermediate errors, and fail to reuse experience from recurring objects. These failures expose a systems problem: test-time grounding must acquire the right evidence, decide whether that evidence is reliable enough to commit, and do so under bounded inference cost without access to labels. We propose Affordance Agent Harness, a closed-loop runtime that unifies heterogeneous skills with an evidence store and cost control, retrieves episodic memories to provide priors for recurring categories, and employs a Router to adaptively select and parameterize skills. An affordance-specific Verifier then gates commitments using self-consistency, cross-scale stability, and evidence sufficiency, triggering targeted retries before a final judge fuses accumulated evidence and trajectories into the prediction. Experiments on multiple affordance benchmarks and difficulty-controlled subsets show a stronger accuracy-cost Pareto frontier than fixed-pipeline baselines, improving grounding quality while reducing average skill calls and latency. Project page: https://tenplusgood.github.io/a-harness-page/.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.