Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment

April 21, 20262604.19548

Bobo Li, Rui Wu, Zibo Ji, Meishan Zhang, Hao Fei + 3 more

cs.CLcs.AIcs.CY

TLDR

This paper introduces ReTAS, a new method to mitigate Actor-Observer Asymmetry in LLM agents by enforcing perspective-invariant reasoning.

Key contributions

Reveals Actor-Observer Asymmetry (AOA) in multi-agent LLMs, where roles bias fault attribution.
Quantifies AOA using a new benchmark, showing >20% perspective-swapping bias in agents.
Introduces ReTAS, a dialectically aligned model, to enforce perspective-invariant reasoning in LLM agents.
ReTAS integrates dialectical chain-of-thought to synthesize conflicting viewpoints for objective consensus.

Why it matters

Multi-agent LLM systems are crucial for complex tasks, but cognitive biases like AOA can undermine their reliability. This work provides a critical diagnosis of AOA and a novel solution, ReTAS, ensuring agents can achieve more objective and consistent fault resolution. This improves the trustworthiness and performance of autonomous LLM agents.

Original Abstract

Large Language Model agents have rapidly evolved from static text generators into dynamic systems capable of executing complex autonomous workflows. To enhance reliability, multi-agent frameworks assigning specialized roles are increasingly adopted to enable self-reflection and mutual auditing. While such role-playing effectively leverages domain expert knowledge, we find it simultaneously induces a human-like cognitive bias known as Actor-Observer Asymmetry (AOA). Specifically, an agent acting as an actor (during self-reflection) tends to attribute failures to external factors, whereas an observer (during mutual auditing) attributes the same errors to internal faults. We quantify this using our new Ambiguous Failure Benchmark, which reveals that simply swapping perspectives triggers the AOA effect in over 20% of cases for most models. To tame this bias, we introduce ReTAS (Reasoning via Thesis-Antithesis-Synthesis), a model trained through dialectical alignment to enforce perspective-invariant reasoning. By integrating dialectical chain-of-thought with Group Relative Policy Optimization, ReTAS guides agents to synthesize conflicting viewpoints into an objective consensus. Experiments demonstrate that ReTAS effectively mitigates attribution inconsistency and significantly improves fault resolution rates in ambiguous scenarios.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers