CollabCoder: Plan-Code Co-Evolution via Collaborative Decision-Making for Efficient Code Generation

April 15, 20262604.13946

Duy Tung Doan, Quang Huy Phung, Dzung Nguyen, Khac-Hoai Nam Bui

cs.SEcs.CL

TLDR

CollabCoder introduces a Plan-Code Co-Evolution framework using dynamic multi-agent collaboration for efficient and robust automated code generation.

Key contributions

Introduces CollabCoder, a Plan-Code Co-Evolution framework for dynamic code generation.
Uses collaborative decision-making between plan and code modules for debugging.
Significantly improves code quality and robustness across various benchmarks.
Reduces computational overhead and API calls while matching SOTA performance.

Why it matters

Automated code generation faces challenges like static planning and high overhead. CollabCoder addresses this with dynamic collaboration, leading to more efficient and robust solutions. It achieves SOTA results with fewer resources, making complex code generation more practical.

Original Abstract

Automated code generation remains a persistent challenge in software engineering, as conventional multi-agent frameworks are often constrained by static planning, isolated execution, high computational overhead, and limited adaptability to complex tasks. This paper introduces CollabCoder, a novel Plan-Code Co-Evolution framework that improves code generation through dynamic multi-agent collaboration. The core idea is to design a collaborative decision-making process between the plan module and the code module to decide which module should be executed for the debugging process. Extensive experiments on widely used benchmarks demonstrate that CollabCoder consistently improves code quality and robustness across tasks. Importantly, CollabCoder achieves performance comparable to or exceeding current state-of-the-art methods while reducing computational overhead, with efficiency gains becoming more pronounced as benchmark difficulty increases. On the more challenging LiveCodeBench and xCodeEval benchmarks, our approach improves performance by 11-20% over strong baselines while reducing the number of API calls by an average of 4-10 per execution.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers