Why Expert Alignment Is Hard: Evidence from Subjective Evaluation
Tzu-Mi Lin, Wataru Hirota, Tatsuya Ishigaki, Lung-Hao Lee, Chung-Chi Chen
TLDR
This paper reveals why aligning LLMs with expert judgment in subjective tasks is difficult, highlighting heterogeneity, tacit knowledge, and dimension dependency.
Key contributions
- Expert alignment difficulty varies significantly, reflecting diverse evaluation styles.
- Explicit criteria and reasoning don't always improve alignment, suggesting tacit knowledge plays a role.
- Editing for alignment is sensitive to example choice, with small edits offering unstable gains.
- Alignment difficulty varies by dimension; content-based dimensions are easier than those needing external knowledge.
Why it matters
This research provides crucial insights into the inherent complexities of aligning LLMs with human experts, especially in subjective domains. Understanding these challenges is vital for developing more effective and robust AI alignment strategies, moving beyond simple rule-based approaches.
Original Abstract
Aligning large language models with expert judgment is especially difficult in subjective evaluation tasks, where experts may disagree, rely on tacit criteria, and change their judgments over time. In this paper, we study expert alignment as a way to understand this difficulty. Using expert evaluations and follow-up questionnaires, we examine how different forms of expert information affect alignment and what this reveals about subjective judgment. Our findings show four consistent patterns. First, alignment difficulty varies substantially across experts, suggesting that expert evaluation styles differ widely in their distance from a model's prior behavior. Second, explicit criteria and reasoning do not always improve alignment, indicating that expert judgment is not fully captured by verbalized rules. Third, editing is sensitive to both the number and the identity of examples, with small numbers of edits providing useful but unstable gains. Fourth, alignment difficulty differs across evaluation dimensions: dimensions grounded more directly in proposal content are easier to align, while dimensions requiring external knowledge or value-based judgment remain harder. Taken together, these results suggest that expert alignment is difficult not only because of model limitations, but also because subjective evaluation is inherently heterogeneous, partly tacit, dimension-dependent, and temporally unstable.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.