Multi-Agent Decision-Focused Learning via Value-Aware Sequential Communication
Benjamin Amoh, Geoffrey Parker, Wesley Marrero
TLDR
SeqComm-DFL improves multi-agent coordination under partial observability by optimizing communication for decision quality, leading to significant performance gains.
Key contributions
- Introduces SeqComm-DFL, unifying sequential communication with decision-focused learning for multi-agent tasks.
- Employs value-aware message generation via sequential Stackelberg conditioning for optimal decision quality.
- Enables efficient end-to-end training using QMIX factorization and implicit differentiation.
- Achieves 4-6x higher rewards and >13% win rate improvements on healthcare and SMAC benchmarks.
Why it matters
This paper addresses a key challenge in multi-agent systems: optimizing communication for actual decision quality rather than intermediate proxies. By introducing SeqComm-DFL, it enables more effective coordination under partial observability. The significant performance gains demonstrate its potential for complex real-world multi-agent applications.
Original Abstract
Multi-agent coordination under partial observability requires agents to share complementary private information. While recent methods optimize messages for intermediate objectives (e.g., reconstruction accuracy or mutual information), rather than decision quality, we introduce \textbf{SeqComm-DFL}, unifying the sequential communication with decision-focused learning for task performance. Our approach features \emph{value-aware message generation with sequential Stackelberg conditioning}: messages maximize receiver decision quality and are generated in priority order, with agents conditioning on their predecessors. The \emph{guidance potential} determined by their prosocial ordering. We extend Optimal Model Design to communication-augmented world models with QMIX factorization, enabling efficient end-to-end training via implicit differentiation. We prove information-theoretic bounds showing that communication value scales with coordination gaps and establish $\mathcal{O}(1/\sqrt{T})$ convergence for the bilevel optimization, where $T$ denotes the number of training iterations. On collaborative healthcare and StarCraft Multi-Agent Challenge (SMAC) benchmarks, SeqComm-DFL achieves four to six times higher cumulative rewards and over 13\% win rate improvements, enabling coordination strategies inaccessible under information asymmetry.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.