ArXiv TLDR

Lexical Anthropomorphization Influences on Moral Judgments of AI Bad Behavior

🐦 Tweet
2604.25814

Jaime Banks, Nicholas David Bowman, Roman Saladino

cs.HCcs.CY

TLDR

This paper finds that humanizing language has little effect on moral judgments of misbehaving AI; the type of violation is the strongest predictor.

Key contributions

  • Lexical anthropomorphism and design cues minimally influence moral judgments of misbehaving AI.
  • High anthropomorphic primes can increase perceptions of AI's capacity for dishonesty.
  • The specific type of moral violation is the strongest predictor of judgments.
  • Harm and degradation violations lead to the broadest negative AI character assessments.

Why it matters

Understanding how we judge AI's bad behavior is crucial for its ethical development and public perception. This research challenges assumptions about anthropomorphism's impact, highlighting the importance of the specific moral violation. It offers insights for designing more robust and trustworthy AI systems.

Original Abstract

Anthropomorphic language describing artificial intelligence (AI) is widespread in media, policy, and everyday discourse; so too are discussions of AI bad behavior, from hallucinations to inappropriate comments. How does humanizing language about AI shape moral judgments when AI behaves badly? Across four experiments (total N = 1,020), we tested whether lexical anthropomorphism (LA) primes shape judgments of AI moral character, behavior morality, and behavioral responsibility. Studies 1-3 tested interactions between anthropomorphic language and humanizing design cues (icons, names, self-referencing) in the context of amoral errors. Study 4 extended this to genuinely immoral AI behavior across seven moral-violation types. Results indicate humanizing language and design cues have little influence on moral judgments of misbehaving AI. Where effects emerged, high-anthropomorphic primes elevated perceptions of an AI's capacity for dishonesty. The type of moral violation observed was the strongest predictor of moral judgments, with harm and degradation violations producing the broadest negative character assessments. Prime drift, horn effects, and egoistic value orientations emerged as potentially important predictors of AI moral judgments.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.