ArXiv TLDR

Coding Agents Don't Know When to Act

🐦 Tweet
2605.07769

Thibaud Gloaguen, Niels Mündler, Mark Müller, Veselin Raychev, Martin Vechev

cs.SE

TLDR

Coding agents often fail to recognize when no code changes are needed, exhibiting an "action bias" and proposing unnecessary fixes.

Key contributions

  • Introduced FixedBench, a 200-task benchmark to test coding agents' ability to abstain from action.
  • Found state-of-the-art agents propose unnecessary code changes in 35-65% of already fixed issues.
  • Identified an "action bias" in LLMs, where they act even when inaction is appropriate.
  • Showed explicit instructions help but introduce new failure modes for partially fixed bugs.

Why it matters

This paper highlights a critical flaw in current coding agents: their inability to recognize when no action is required. This "action bias" leads to unnecessary technical debt and reduces agent reliability in real-world software maintenance. It suggests a need to re-evaluate LLM training objectives to explicitly value inaction as a valid path to success.

Original Abstract

Coding agents are increasingly deployed to autonomously maintain software, including to resolve user-reported issues: a bug report comes in and the agent creates a patch to address it. However, in any real-world deployment, they will encounter stale bug reports about issues that have already been resolved. Agents should recognize this and abstain from modifying the code to avoid accumulating technical debt. To systematically evaluate whether current agents do so, we introduce FixedBench, a code benchmark with 200 human-verified coding tasks in which no code changes are required, testing five recent models across four agent harnesses. We find that even state-of-the-art models fail, proposing undesirable changes (excluding tests and documentation) in $35$ to $65\%$ of cases. Explicit instructions to reproduce the issue before patching partially address this issue but introduce a new failure mode: when an issue is partially fixed, they abstain even though a patch would still be needed. More broadly, our results indicate that LLMs fall prey to an action bias: they choose to act even if inaction would be appropriate. To break this pattern, inaction needs to be explicitly framed as a path to success, which highlights an overreliance on human guidance implicit in current training objectives.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.