Stayin' Aligned Over Time: Towards Longitudinal Human-LLM Alignment via Contextual Reflection and Privacy-Preserving Behavioral Data
Simret Araya Gebreegziabher, Allison E Sproul, Yinuo Yang, Chaoran Chen, Diego Gómez-Zará + 1 more
TLDR
A new framework for longitudinal human-LLM alignment reveals that user preferences change over time, challenging static evaluation methods.
Key contributions
- Proposes a methodological shift from single-moment to longitudinal LLM alignment evaluation.
- Introduces a framework combining in-situ preference capture, follow-up reflection, and behavioral traces.
- Presents BITE, a browser-based system for detecting consequential LLM interactions and prompting reflection.
- A 2-week study showed significant differences between immediate and later user preferences in LLM outputs.
Why it matters
Current LLM alignment methods assume static preferences, which this paper challenges. It introduces a longitudinal framework and system (BITE) to capture how user preferences evolve over time. This work is crucial for developing more robust and truly aligned AI systems that adapt to real-world consequences, moving beyond immediate feedback.
Original Abstract
Current human-AI alignment and evaluation methods for large language models (LLMs) often rely on preference signals collected immediately after an interaction. This practice implicitly treats preference as static, even though many LLM-mediated decisions unfold over time and may be re-evaluated differently after real-world consequences and observed outcomes. Therefore, we argue for a methodological shift from single-moment preference elicitation to longitudinal, context-situated alignment measurement. We present a methodological framework for collecting temporally grounded alignment signals by combining (1) in-situ preference capture, (2) context-triggered follow-up preference reflection, and (3) privacy-preserving behavioral traces that help interpret preference change. As an instantiation of this methodology, we introduce BITE, a browser-based system that detects consequential LLM interactions, prompts reflection across later decision points, and supports progressive, user-controlled consent for sharing behavioral data. Through a two week longitudinal deployment study with 8 participants, our approach surfaced differences between immediate and later user preferences in accuracy, relevance and other dimensions of the LLM output. Our findings highlight the limitations of single-moment preference datasets and underscore the importance of longitudinal methods for alignment evaluation in everyday use.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.