NonZero: Interaction-Guided Exploration for Multi-Agent Monte Carlo Tree Search
Sizhe Tang, Zuyuan Zhang, Mahdi Imani, Tian Lan
TLDR
NonZero improves multi-agent Monte Carlo Tree Search by using interaction-guided exploration to overcome the exponential joint-action space problem.
Key contributions
- Keeps multi-agent MCTS tractable by avoiding direct exploration of the full joint-action space.
- Uses surrogate-guided selection over a low-dimensional representation with an interaction-guided proposal.
- Employs an interaction score for single and two-agent deviations to reveal coordination benefits.
- Guarantees sublinear local-regret for reaching approximate graph-local optima without enumeration.
Why it matters
Multi-agent MCTS struggles with scalability due to the exponential growth of joint actions. NonZero provides a novel, tractable approach to this fundamental challenge, making MCTS more practical for complex cooperative scenarios. Its empirical success demonstrates a significant step towards efficient multi-agent decision-making.
Original Abstract
Monte Carlo Tree Search (MCTS) scales poorly in cooperative multi-agent domains because expansion must consider an exponentially large set of joint actions, severely limiting exploration under realistic search budgets. We propose NonZero, which keeps multi-agent MCTS tractable by running surrogate-guided selection over a low-dimensional nonlinear representation using an interaction-guided proposal rule, instead of directly exploring the full joint-action space. Our exploration uses an interaction score: single-agent deviations are ranked by predicted gain, while two-agent deviations are scored by a mixed-difference measure that reveals coordination benefits even when no single agent can improve alone. We formalize candidate proposal as a bandit problem over local deviations and derive a proposal rule, NonZero, with a sublinear local-regret guarantee for reaching approximate graph-local optima without enumerating the joint-action space. Empirically, NonZero improves sample efficiency and final performance on MatGame, SMAC, and SMACv2 relative to strong model-based and model-free baselines under matched search budgets.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.