ArXiv TLDR

Do AI Coding Agents Log Like Humans? An Empirical Study

🐦 Tweet
2604.09409

Youssef Esseddiq Ouatiti, Mohammed Sayagh, Hao Li, Ahmed E. Hassan

cs.SE

TLDR

AI coding agents log differently than humans, often less, and struggle to follow explicit logging instructions, requiring human intervention.

Key contributions

  • Agents log less frequently than humans but show higher log density when they do.
  • Explicit logging instructions are rare (4.7%) and largely ineffective (67% non-compliance).
  • Humans fix 72.5% of agent-generated logging issues, acting as "silent janitors."
  • Natural language instructions for logging are failing, suggesting a need for deterministic guardrails.

Why it matters

This study reveals critical gaps in how AI coding agents handle essential logging, highlighting the limitations of natural language instructions. It underscores the need for better mechanisms, like deterministic guardrails, to ensure robust software observability and reduce human repair effort. This is crucial for developing more reliable AI-powered development tools.

Original Abstract

Software logging is essential for maintaining and debugging complex systems, yet it remains unclear how AI coding agents handle this non-functional requirement. While prior work characterizes human logging practices, the behaviors of AI coding agents and the efficacy of natural language instructions in governing them are unexplored. To address this gap, we conduct an empirical study of 4,550 agentic pull requests across 81 open-source repositories. We compare agent logging patterns against human baselines and analyze the impact of explicit logging instructions. We find that agents change logging less often than humans in 58.4% of repositories, though they exhibit higher log density when they do. Furthermore, explicit logging instructions are rare (4.7%) and ineffective, as agents fail to comply with constructive requests 67% of the time. Finally, we observe that humans perform 72.5% of post-generation log repairs, acting as "silent janitors" who fix logging and observability issues without explicit review feedback. These findings indicate a dual failure in natural language instruction (i.e., scarcity of logging instructions and low agent compliance), suggesting that deterministic guardrails might be necessary to ensure consistent logging practices.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.