Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions

April 23, 20262604.21871

Jiseon Kim, Jea Kwon, Luiz Felipe Vecchietti, Wenchao Dong, Jaehong Kim + 1 more

cs.CL

TLDR

LLMs prioritize rigid moral rules over social sensitivity in relational dilemmas, diverging from their own predictions of human behavior.

Key contributions

Characterizes LLM behavior in relational moral dilemmas using the Whistleblower's Dilemma.
Evaluates moral rightness, predicted human behavior, and autonomous model decisions.
Reveals a divergence: moral rightness is fairness-oriented, but predicted human behavior shifts to loyalty.
LLM decisions align with prescriptive moral rightness, not their own socially sensitive predictions.

Why it matters

LLMs prioritize rigid moral rules over social sensitivity, creating a critical gap where decisions diverge from predicted human behavior. This inconsistency poses risks for real-world deployments and is crucial for developing ethical and socially aware AI.

Original Abstract

Human moral judgment is context-dependent and modulated by interpersonal relationships. As large language models (LLMs) increasingly function as decision-support systems, determining whether they encode these social nuances is critical. We characterize machine behavior using the Whistleblower's Dilemma by varying two experimental dimensions: crime severity and relational closeness. Our study evaluates three distinct perspectives: (1) moral rightness (prescriptive norms), (2) predicted human behavior (descriptive social expectations), and (3) autonomous model decision-making. By analyzing the reasoning processes, we identify a clear cross-perspective divergence: while moral rightness remains consistently fairness-oriented, predicted human behavior shifts significantly toward loyalty as relational closeness increases. Crucially, model decisions align with moral rightness judgments rather than their own behavioral predictions. This inconsistency suggests that LLM decision-making prioritizes rigid, prescriptive rules over the social sensitivity present in their internal world-modeling, which poses a gap that may lead to significant misalignments in real-world deployments.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers