Response Time Enhances Alignment with Heterogeneous Preferences
Federico Echenique, Alireza Fallah, Baihe Huang, Michael I. Jordan
TLDR
This paper shows that using user response times can accurately align LLMs with diverse human preferences, overcoming limitations of standard choice-only methods.
Key contributions
- Standard LLM alignment fails to capture diverse human preferences due to data aggregation.
- Augmenting preference data with user response times restores identifiability of average preferences.
- Introduces a novel Drift-Diffusion Model (DDM) based estimator for heterogeneous preferences.
- Empirically outperforms baselines, converging to true average preference even with single choices.
Why it matters
This paper offers a crucial advancement in aligning LLMs with real-world human diversity by leveraging an easily obtainable signal: response time. It enables more accurate and socially beneficial LLM policies without privacy concerns, paving the way for improved data collection methods.
Original Abstract
Aligning large language models (LLMs) to human preferences typically relies on aggregating pooled feedback into a single reward model. However, this standard approach assumes that all labelers share the same underlying preferences, ignoring the fact that real-world labelers are highly heterogeneous and usually anonymous. Consequently, relying solely on binary choice data fundamentally distorts the learned policy, making the true population-average preference unidentifiable. To overcome this critical limitation, we demonstrate that augmenting preference datasets with a simple, secondary signal -- the user's response time -- can restore the identifiability of the population's average preference. By modeling each decision as a Drift-Diffusion Model (DDM), we introduce a novel, consistent estimator of heterogeneous preferences that successfully corrects the distortions of standard choice-only labels. We prove that our estimator asymptotically converges to the true average preference even in extreme cases where each anonymous labeler contributes only a single choice. Empirically, across both synthetic and real-world datasets, our method consistently outperforms standard baselines that otherwise fail and plateau at a bias floor. Because response times are essentially free to record and require zero user tracking or identification, our results bring promises and open up new opportunities for future data-collection pipelines to improve the social benefit without requiring user-level identifiers or repeated elicitations.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.