ArXiv TLDR

The Effect of Idea Elaboration on the Automatic Assessment of Idea Originality

🐦 Tweet
2604.20569

Umberto Domanti, Moritz Mock, Sergio Agnoli, Antonella De Angeli

cs.HC

TLDR

LLMs show a self-preference bias in assessing idea originality, but this bias disappears when idea elaboration is considered.

Key contributions

  • Confirmed a "self-preference bias" in LLMs, where they favored AI-generated responses when assessing originality.
  • Analyzed 4,813 responses from humans and ChatGPT-4o on the Alternate Uses Task using human and AI raters.
  • Crucially, this LLM self-preference bias disappeared entirely when idea elaboration was controlled for in the analysis.

Why it matters

This paper reveals a significant self-preference bias in LLMs assessing creativity, but critically shows this bias vanishes when idea elaboration is considered. This finding is vital for developing more accurate, human-aligned AI creativity assessment tools, overcoming current limitations in cost and subjectivity.

Original Abstract

Automatic systems are increasingly used to assess the originality of responses in creative tasks. They offer a potential solution to key limitations of human assessment (cost, fatigue, and subjectivity), but there is preliminary evidence of a self-preference bias. Accordingly, automatic systems tend to prefer outcomes that are more closely related to their style, rather than to the human one. In this paper, we investigated how Large Language Models (LLMs) align with human raters in assessing the originality of responses in a divergent thinking task. We analysed 4,813 responses to the Alternate Uses Task produced by higher and lower creative humans and ChatGPT-4o. Human raters were two university students who underwent intensive training. Machine raters were two specialised systems fine-tuned on AUT responses and corresponding human ratings (OCSAI and CLAUS) and ChatGPT-4o, which was prompted with the same instructions as human raters. Results confirmed the presence of a self-preference bias in LLMs. Automatic systems tended to privilege artificial responses. However, this self-preference bias disappeared when the analyses controlled for the idea elaboration. We discuss theoretical and methodological implications of these findings by highlighting future directions for research on creativity assessment.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.