Training-Free Cultural Alignment of Large Language Models via Persona Disagreement

May 11, 20262605.10843

Huynh Trung Kiet, Dao Sy Duy Minh, Tuan Nguyen, Chi-Nguyen Tran, Phu-Hoa Pham + 3 more

cs.CLcs.AIcs.CY

TLDR

DISCA is a training-free, inference-time method that culturally aligns LLMs by leveraging within-country sociodemographic disagreement, improving fairness.

Key contributions

Introduces DISCA, a training-free, inference-time method for cultural alignment of black-box LLMs.
Leverages within-country sociodemographic disagreement from WVS-grounded personas as a steering signal.
Converts persona disagreement into a bounded, loss-averse logit correction for cultural steering.
Reduces cultural misalignment by 10-24% on MultiTP and 2-7% on open-ended scenarios.

Why it matters

LLMs often lack cultural neutrality, especially in moral judgments. Existing alignment methods are costly or require white-box access. This work provides a scalable, inference-time solution for aligning LLMs to diverse global cultural preferences without fine-tuning.

Original Abstract

Large language models increasingly mediate decisions that turn on moral judgement, yet a growing body of evidence shows that their implicit preferences are not culturally neutral. Existing cultural alignment methods either require per-country preference data and fine-tuning budgets or assume white-box access to model internals that commercial APIs do not expose. In this work, we focus on this realistic black-box, public-data-only regime and observe that within-country sociodemographic disagreement, not consensus, is the primary steering signal. We introduce DISCA (Disagreement-Informed Steering for Cultural Alignment), an inference-time method that instantiates each country as a panel of World-Values-Survey-grounded persona agents and converts their disagreement into a bounded, loss-averse logit correction. Across 20 countries and 7 open-weight backbones (2B--70B), DISCA reduces cultural misalignment on MultiTP by 10--24% on the six backbones >=3.8B, and 2--7% on open-ended scenarios, without changing any weights. Our results suggest that inference-time calibration is a scalable alternative to fine-tuning for serving the long tail of global moral preferences.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers