From Local Indices to Global Identifiers: Generative Reranking for Recommender Systems via Global Action Space

April 28, 20262604.25291

Pengyue Jia, Xiaobei Wang, Yingyi Zhang, Shuchang Liu, Yupeng Hou + 12 more

cs.IR

TLDR

GloRank introduces a generative reranking framework for recommender systems that uses global item identifiers instead of local indices, improving item understanding and performance.

Key contributions

Proposes GloRank, a generative reranking framework for recommender systems.
Shifts reranking from local indices to generating global item identifiers using discrete tokens.
Decouples item scoring from input order, ensuring consistent evaluation and better item understanding.
Employs a two-stage optimization: supervised pre-training followed by RL-based post-training.

Why it matters

This paper addresses a key limitation in recommender system reranking by introducing a novel generative approach. By using global identifiers, GloRank ensures a stable understanding of items, leading to more robust and effective recommendations. Its superior performance, especially in cold-start scenarios, makes it a significant advancement for real-world applications.

Original Abstract

In modern recommender systems, list-wise reranking serves as a critical phase within the multi-stage pipeline, finalizing the exposed item sequence and directly impacting user satisfaction by modeling complex intra-list item dependencies. Existing methods typically formulate this task as selecting indices from the local input list. However, this approach suffers from a semantically inconsistent action space: the same output neuron (logits) represents different items across different samples, preventing the model from establishing a stable, intrinsic understanding of the items. To address this, we propose GloRank (Global Action Space Ranker), a generative framework that shifts reranking from selecting local indices to generating global identifiers. Specifically, we represent items as sequences of discrete tokens and reformulate reranking as a token generation task. This design effectively decouples the scoring mechanism from the variable input order, ensuring that items are evaluated against a consistent global standard. We further enhance this with a two-stage optimization pipeline: a supervised pre-training phase to initialize the model with high-quality demonstrations, followed by a reinforcement learning-based post-training phase to directly maximize list-wise utility. Extensive experiments on two public benchmarks and a large-scale industrial dataset, coupled with online A/B tests, demonstrate that GloRank consistently outperforms state-of-the-art baselines and achieves superior robustness in cold-start scenarios.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers