Beyond Static Best-of-N: Bayesian List-wise Alignment for LLM-based Recommendation

May 6, 20262605.04559

Ruijun Chen, Chongming Gao, Jiawei Chen, Weiqin Yang, Xiangnan He

cs.IR

TLDR

BLADE introduces a Bayesian framework for LLM-based recommenders to dynamically optimize list-wise metrics, outperforming static methods.

Key contributions

Identifies and solves key limitations (Indiscriminate Supervision, Gradient Decay) in LLM recommender alignment.
Introduces BLADE, a Bayesian framework for dynamic, list-wise alignment in LLM-based recommendation.
BLADE dynamically updates its target distribution using historical priors and current model rollouts.
Significantly outperforms SOTA, achieving sustained gains in ranking accuracy, fairness, and diversity.

Why it matters

This paper matters because it tackles the critical challenge of optimizing complex, non-differentiable list-wise metrics in LLM-based recommenders. BLADE provides a novel, dynamic solution that adapts to model improvements, overcoming the limitations of static methods. This leads to more accurate, fair, and diverse recommendations, crucial for real-world applications.

Original Abstract

Large Language Models have revolutionized recommender systems (LLM4Rec) by leveraging their generative capabilities to model complex user preferences. However, existing LLM4Rec methods primarily rely on token-level objectives, making it difficult to optimize list-level and non-differentiable metrics (e.g., NDCG, fairness) that define actual recommendation quality. While Best-of-N (BoN) directly optimizes these metrics during inference, its high computational cost hinders real-world deployment. To address this, BoN Alignment aims to distill the search capability into the model itself, yet current approaches suffer from two critical limitations: (1) Indiscriminate Supervision, where the static reference fails to distinguish the relative quality of candidates exceeding its empirical range, leading to a loss of ranking guidance; and (2) Gradient Decay, where the effective supervision signal rapidly diminishes as the evolving policy improves, resulting in inefficient optimization. To overcome these challenges, we propose BLADE (Bayesian List-wise Alignment via Dynamic Estimation). Unlike static approaches, BLADE introduces a Bayesian framework that continuously updates the target distribution by fusing historical priors with dynamic evidence from the model's current rollouts. This mechanism constructs a self-evolving target that adapts to the model's growing capabilities, ensuring the training signal remains informative throughout the learning process. Extensive experiments on three real-world datasets demonstrate that BLADE significantly outperforms state-of-the-art baselines. Crucially, it breaks the static performance upper bound, achieving sustained gains in both ranking accuracy (Recall, NDCG) and complex list-wise metrics (Fairness, Diversity). The code is available via https://github.com/RegionCh/BLADE.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers