Impact of large language models on peer review opinions from a fine-grained perspective: Evidence from top conference proceedings in AI
Wenqing Wu, Chengzhi Zhang, Yi Zhao, Tong Bao
TLDR
This study reveals that LLMs make peer reviews longer and more fluent but reduce focus on deep evaluative aspects like originality.
Key contributions
- Analyzed linguistic features and evaluation aspects of peer reviews post-LLM emergence.
- Identified potential LLM-assisted reviews using a maximum likelihood estimation method.
- Found reviews became longer, more fluent, and focused more on summaries and surface clarity.
- Revealed a decline in attention to deep evaluative dimensions like originality and replicability.
Why it matters
This paper provides crucial empirical evidence on how LLMs subtly alter academic peer review. It reveals a shift towards superficial evaluations, potentially impacting research quality and innovation. Understanding these changes is vital for maintaining the integrity of scientific communication.
Original Abstract
With the rapid advancement of Large Language Models (LLMs), the academic community has faced unprecedented disruptions, particularly in the realm of academic communication. The primary function of peer review is improving the quality of academic manuscripts, such as clarity, originality and other evaluation aspects. Although prior studies suggest that LLMs are beginning to influence peer review, it remains unclear whether they are altering its core evaluative functions. Moreover, the extent to which LLMs affect the linguistic form, evaluative focus, and recommendation-related signals of peer-review reports has yet to be systematically examined. In this study, we examine the changes in peer review reports for academic articles following the emergence of LLMs, emphasizing variations at fine-grained level. Specifically, we investigate linguistic features such as the length and complexity of words and sentences in review comments, while also automatically annotating the evaluation aspects of individual review sentences. We also use a maximum likelihood estimation method, previously established, to identify review reports that potentially have modified or generated by LLMs. Finally, we assess the impact of evaluation aspects mentioned in LLM-assisted review reports on the informativeness of recommendation for paper decision-making. The results indicate that following the emergence of LLMs, peer review texts have become longer and more fluent, with increased emphasis on summaries and surface-level clarity, as well as more standardized linguistic patterns, particularly reviewers with lower confidence score. At the same time, attention to deeper evaluative dimensions, such as originality, replicability, and nuanced critical reasoning, has declined.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.