ArXiv TLDR

NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search

🐦 Tweet
2605.00751

Sizhe Tang, Zuyuan Zhang, Mahdi Imani, Tian Lan

cs.LG

TLDR

NonZero improves multi-agent Monte Carlo Tree Search by using interaction-guided exploration to overcome the exponential joint-action space problem.

Key contributions

  • Keeps multi-agent MCTS tractable by avoiding direct exploration of the full joint-action space.
  • Uses surrogate-guided selection over a low-dimensional representation with an interaction-guided proposal.
  • Employs an interaction score for single and two-agent deviations to reveal coordination benefits.
  • Guarantees sublinear local-regret for reaching approximate graph-local optima without enumeration.

Why it matters

Multi-agent MCTS struggles with scalability due to the exponential growth of joint actions. NonZero provides a novel, tractable approach to this fundamental challenge, making MCTS more practical for complex cooperative scenarios. Its empirical success demonstrates a significant step towards efficient multi-agent decision-making.

Original Abstract

Monte Carlo Tree Search (MCTS) scales poorly in cooperative multi-agent domains because expansion must consider an exponentially large set of joint actions, severely limiting exploration under realistic search budgets. We propose NonZero, which keeps multi-agent MCTS tractable by running surrogate-guided selection over a low-dimensional nonlinear representation using an interaction-guided proposal rule, instead of directly exploring the full joint-action space. Our exploration uses an interaction score: single-agent deviations are ranked by predicted gain, while two-agent deviations are scored by a mixed-difference measure that reveals coordination benefits even when no single agent can improve alone. We formalize candidate proposal as a bandit problem over local deviations and derive a proposal rule, NonZero, with a sublinear local-regret guarantee for reaching approximate graph-local optima without enumerating the joint-action space. Empirically, NonZero improves sample efficiency and final performance on MatGame, SMAC, and SMACv2 relative to strong model-based and model-free baselines under matched search budgets.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.