Machine Behavior in Relational Moral Dilemmas: Moral Rightness, Predicted Human Behavior, and Model Decisions
Jiseon Kim, Jea Kwon, Luiz Felipe Vecchietti, Wenchao Dong, Jaehong Kim + 1 more
TLDR
LLMs prioritize rigid moral rules over social sensitivity in relational dilemmas, diverging from their own predictions of human behavior.
Key contributions
- Characterizes LLM behavior in relational moral dilemmas using the Whistleblower's Dilemma.
- Evaluates moral rightness, predicted human behavior, and autonomous model decisions.
- Reveals a divergence: moral rightness is fairness-oriented, but predicted human behavior shifts to loyalty.
- LLM decisions align with prescriptive moral rightness, not their own socially sensitive predictions.
Why it matters
LLMs prioritize rigid moral rules over social sensitivity, creating a critical gap where decisions diverge from predicted human behavior. This inconsistency poses risks for real-world deployments and is crucial for developing ethical and socially aware AI.
Original Abstract
Human moral judgment is context-dependent and modulated by interpersonal relationships. As large language models (LLMs) increasingly function as decision-support systems, determining whether they encode these social nuances is critical. We characterize machine behavior using the Whistleblower's Dilemma by varying two experimental dimensions: crime severity and relational closeness. Our study evaluates three distinct perspectives: (1) moral rightness (prescriptive norms), (2) predicted human behavior (descriptive social expectations), and (3) autonomous model decision-making. By analyzing the reasoning processes, we identify a clear cross-perspective divergence: while moral rightness remains consistently fairness-oriented, predicted human behavior shifts significantly toward loyalty as relational closeness increases. Crucially, model decisions align with moral rightness judgments rather than their own behavioral predictions. This inconsistency suggests that LLM decision-making prioritizes rigid, prescriptive rules over the social sensitivity present in their internal world-modeling, which poses a gap that may lead to significant misalignments in real-world deployments.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.