MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems
Zhexuan Wang, Xuebo Liu, Li Wang, Zifei Shan, Yutong Wang + 2 more
TLDR
MASPO optimizes prompts for LLM multi-agent systems by jointly evaluating their impact on successor agents, improving collaborative task performance.
Key contributions
- Introduces MASPO, a framework for automatic, iterative prompt refinement in LLM multi-agent systems.
- Uses a joint evaluation mechanism, assessing prompts by their impact on successor agents' downstream success.
- Bridges local-global objective gaps without relying on ground-truth labels for prompt evaluation.
- Employs a data-driven evolutionary beam search to efficiently navigate the high-dimensional prompt space.
Why it matters
Jointly optimizing prompts in LLM multi-agent systems is challenging due to objective misalignment. MASPO addresses this by automatically refining prompts system-wide, bridging local interactions with global outcomes. This significantly enhances the performance and robustness of collaborative LLM agents.
Original Abstract
Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them across interacting agents remains a non-trivial challenge, primarily due to the misalignment between local agent objectives and holistic system goals. To address this, we introduce MASPO, a novel framework designed to automatically and iteratively refine prompts across the entire system. A core innovation of MASPO is its joint evaluation mechanism, which assesses prompts not merely by their local validity, but by their capacity to facilitate downstream success for successor agents. This effectively bridges the gap between local interactions and global outcomes without relying on ground-truth labels. Furthermore, MASPO employs a data-driven evolutionary beam search to efficiently navigate the high-dimensional prompt space. Extensive empirical evaluations across 6 diverse tasks demonstrate that MASPO consistently outperforms state-of-the-art prompt optimization methods, achieving an average accuracy improvement of 2.9. We release our code at https://github.com/wangzx1219/MASPO.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.