Empowering Autonomous Debugging Agents with Efficient Dynamic Analysis
Jiahong Xiang, Xiaoyang Xu, Xiaopan Chu, Hongliang Tian, Yuqun Zhang
TLDR
Introduces Agent-centric Debugging Interface (ADI) for LLM-based agents, enabling efficient function-level debugging and significantly improving autonomous program repair.
Key contributions
- Introduces Agent-centric Debugging Interface (ADI) for efficient, function-level debugging in LLM-based agents.
- Utilizes Frame Lifetime Trace for comprehensive function state execution and high-level navigational commands.
- Resolves 63.8% of SWE-bench tasks with a basic agent, outperforming Claude-Tools at low cost.
- Demonstrates plug-and-play generality, boosting SOTA agents by 6.2% to 18.5% on resolved tasks.
Why it matters
Autonomous debugging agents are limited by inefficient debuggers. This paper introduces ADI, a function-level interface that significantly boosts LLM agent efficiency and repair capabilities. It achieves strong results on SWE-bench at low cost, marking a key advancement for automated program repair.
Original Abstract
Autonomous agents for automated program repair represent a promising frontier in software engineering, yet their effectiveness is often hindered by reliance on post-mortem, coarse-grained execution feedback. While integrating traditional interactive debuggers seems a natural solution, their low-level, line-by-line interaction paradigm turns out to be cost-inefficient for LLM-based agents, leading to exhausted budgets and unproductive loops. To mitigate this, we introduce Agent-centric Debugging Interface (ADI), a novel agent-centric debugging interface designed for cost-efficient, end-to-end autonomous interaction. Specifically, Agent-centric Debugging Interface realizes a function-level interaction paradigm, powered by our Frame Lifetime Trace, a comprehensive data structure encapsulating a function's stateful execution trace, and a set of high-level navigational commands. Our extensive evaluation on the SWE-bench benchmark demonstrates the effectiveness and efficiency of ADI. By simply equipping a basic agent with ADI, it successfully resolves 63.8\% of the tasks on the SWE-bench Verified set, even slightly outperforming the highly optimized and high-investment Claude-Tools agent, at an average cost of USD 1.28 per task with Claude-Sonnet-3.7. Furthermore, we demonstrate ADI's generality by integrating it as a plug-and-play component into existing SOTA agents, delivering consistent gains ranging from 6.2\% to 18.5\% on the resolved tasks. These results indicate that Agent-centric Debugging Interface can provide a general and efficient enhancement for existing autonomous agents.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.