These Aren't the Reviews You're Looking For How Humans Review AI-Generated Pull Requests
Kacper Duma, Patryk Wróblewski, Jagoda Bobińska, Julia Winiarska, Piotr Przymus
TLDR
AI-generated pull requests on GitHub receive significantly less human review and more bot interaction compared to human-authored ones.
Key contributions
- Most AI-generated pull requests (PRs) receive no review at all.
- When reviewed, AI-generated PRs are primarily reviewed by AI agents, not humans.
- Human-authored PRs get more direct human feedback and human-only reviews.
- Human involvement in AI-PR reviews often involves steering agents, not direct evaluation.
Why it matters
This paper reveals a critical difference in how AI-generated code is reviewed, highlighting a lack of human oversight. It challenges current assumptions about review metrics in agentic development, urging re-evaluation of how we measure human involvement.
Original Abstract
We analyze code review interactions for AI-generated pull requests (PRs) on GitHub using the AIDev dataset and compare them to human-authored PRs within the same repositories. We find that most AI-generated PRs receive no review and, when reviewed, are largely dominated by AI agents rather than humans. Human-authored PRs are more likely to receive human-only review and to attract direct human feedback. In contrast, reviews of AI-generated PRs more often take the form of automation-mediated interaction, with human involvement frequently expressed through agent steering rather than standalone evaluation. These results indicate systematic differences in how review activity is structured in agentic workflows and raise challenges for interpreting review metrics as indicators of human oversight in large-scale mining studies.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.