Beyond Static Best-of-N: Bayesian List-wise Alignment for LLM-based Recommendation
Ruijun Chen, Chongming Gao, Jiawei Chen, Weiqin Yang, Xiangnan He
TLDR
BLADE introduces a Bayesian framework for LLM-based recommenders to dynamically optimize list-wise metrics, outperforming static methods.
Key contributions
- Identifies and solves key limitations (Indiscriminate Supervision, Gradient Decay) in LLM recommender alignment.
- Introduces BLADE, a Bayesian framework for dynamic, list-wise alignment in LLM-based recommendation.
- BLADE dynamically updates its target distribution using historical priors and current model rollouts.
- Significantly outperforms SOTA, achieving sustained gains in ranking accuracy, fairness, and diversity.
Why it matters
This paper matters because it tackles the critical challenge of optimizing complex, non-differentiable list-wise metrics in LLM-based recommenders. BLADE provides a novel, dynamic solution that adapts to model improvements, overcoming the limitations of static methods. This leads to more accurate, fair, and diverse recommendations, crucial for real-world applications.
Original Abstract
Large Language Models have revolutionized recommender systems (LLM4Rec) by leveraging their generative capabilities to model complex user preferences. However, existing LLM4Rec methods primarily rely on token-level objectives, making it difficult to optimize list-level and non-differentiable metrics (e.g., NDCG, fairness) that define actual recommendation quality. While Best-of-N (BoN) directly optimizes these metrics during inference, its high computational cost hinders real-world deployment. To address this, BoN Alignment aims to distill the search capability into the model itself, yet current approaches suffer from two critical limitations: (1) Indiscriminate Supervision, where the static reference fails to distinguish the relative quality of candidates exceeding its empirical range, leading to a loss of ranking guidance; and (2) Gradient Decay, where the effective supervision signal rapidly diminishes as the evolving policy improves, resulting in inefficient optimization. To overcome these challenges, we propose BLADE (Bayesian List-wise Alignment via Dynamic Estimation). Unlike static approaches, BLADE introduces a Bayesian framework that continuously updates the target distribution by fusing historical priors with dynamic evidence from the model's current rollouts. This mechanism constructs a self-evolving target that adapts to the model's growing capabilities, ensuring the training signal remains informative throughout the learning process. Extensive experiments on three real-world datasets demonstrate that BLADE significantly outperforms state-of-the-art baselines. Crucially, it breaks the static performance upper bound, achieving sustained gains in both ranking accuracy (Recall, NDCG) and complex list-wise metrics (Fairness, Diversity). The code is available via https://github.com/RegionCh/BLADE.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.