ArXiv TLDR

Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries

🐦 Tweet
2605.06223

Junhyuk Kwon, Seungjoon Lee, Hyejin Park, Kyle Min, Jungseul Ok

cs.AIcs.RO

TLDR

ProCompNav is a two-stage framework that uses comparative judgment and binary questions to efficiently navigate ambiguous user queries.

Key contributions

  • Proposes ProCompNav, a two-stage framework for ambiguous instance navigation.
  • Identifies target instances through comparative judgment using binary yes/no questions.
  • Extracts attribute-value pairs to split candidate pools and prune inconsistent candidates.
  • Achieves state-of-the-art success on CoIN-Bench and TextNav while reducing user response length.

Why it matters

This paper addresses the challenge of ambiguous user queries in natural language navigation, where existing methods often burden users or fail to distinguish targets. ProCompNav offers a proactive, efficient solution by reframing disambiguation as pool-level discriminative questioning. This significantly improves success rates and reduces user effort in interactive systems.

Original Abstract

Natural-language instance navigation becomes challenging when the initial user request does not uniquely specify the target instance. A practical agent should reduce the user's burden by actively asking only the information needed to distinguish the target from similar distractors, rather than requiring a detailed description upfront. Existing approaches often fall short of this goal: they may stop at the first plausible candidate before sufficiently exploring alternatives, or, even after collecting multiple candidates, ask about the target's attributes derived from individual candidates rather than questions selected to distinguish candidates in the pool. As a result, despite the dialogue, the agent may still fail to distinguish the target from distractors, leading to premature decisions and lengthy user responses. We propose Proactive Instance Navigation with Comparative Judgment (ProCompNav), a two-stage framework that first constructs a candidate pool and then identifies the target through comparative judgment. At each round, ProCompNav extracts an attribute-value pair that splits the current pool, asks a binary yes/no question, and prunes all inconsistent candidates at once. This reframes disambiguation from open-ended target description to pool-level discriminative questioning, where each question is chosen to narrow the candidate set. On CoIN-Bench, ProCompNav improves Success Rate over interactive baselines with the same minimal input and non-interactive baselines with detailed descriptions, while substantially reducing Response Length. ProCompNav also achieves state-of-the-art Success Rate on TextNav, suggesting that comparative judgment is broadly useful for instance-level navigation among similar distractors.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.