MASPO: Joint Prompt Optimization for LLM-based Multi-Agent Systems

May 7, 20262605.06623

Zhexuan Wang, Xuebo Liu, Li Wang, Zifei Shan, Yutong Wang + 2 more

cs.AIcs.CL

TLDR

MASPO optimizes prompts for LLM multi-agent systems by jointly evaluating their impact on successor agents, improving collaborative task performance.

Key contributions

Introduces MASPO, a framework for automatic, iterative prompt refinement in LLM multi-agent systems.
Uses a joint evaluation mechanism, assessing prompts by their impact on successor agents' downstream success.
Bridges local-global objective gaps without relying on ground-truth labels for prompt evaluation.
Employs a data-driven evolutionary beam search to efficiently navigate the high-dimensional prompt space.

Why it matters

Jointly optimizing prompts in LLM multi-agent systems is challenging due to objective misalignment. MASPO addresses this by automatically refining prompts system-wide, bridging local interactions with global outcomes. This significantly enhances the performance and robustness of collaborative LLM agents.

Original Abstract

Large language model (LLM)-based Multi-agent systems (MAS) have shown promise in tackling complex collaborative tasks, where agents are typically orchestrated via role-specific prompts. While the quality of these prompts is pivotal, jointly optimizing them across interacting agents remains a non-trivial challenge, primarily due to the misalignment between local agent objectives and holistic system goals. To address this, we introduce MASPO, a novel framework designed to automatically and iteratively refine prompts across the entire system. A core innovation of MASPO is its joint evaluation mechanism, which assesses prompts not merely by their local validity, but by their capacity to facilitate downstream success for successor agents. This effectively bridges the gap between local interactions and global outcomes without relying on ground-truth labels. Furthermore, MASPO employs a data-driven evolutionary beam search to efficiently navigate the high-dimensional prompt space. Extensive empirical evaluations across 6 diverse tasks demonstrate that MASPO consistently outperforms state-of-the-art prompt optimization methods, achieving an average accuracy improvement of 2.9. We release our code at https://github.com/wangzx1219/MASPO.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers