On the Equivalence Between Auto-Regressive Next Token Prediction and Full-Item-Vocabulary Maximum Likelihood Estimation in Generative Recommendation--A Short Note

April 17, 20262604.15739

Yusheng Huang, Shuang Yang, Zhaojie Liu, Han Li

cs.IR

TLDR

This paper proves that auto-regressive next-token prediction in generative recommendation is mathematically equivalent to full-item-vocabulary maximum likelihood estimation.

Key contributions

Proves k-token auto-regressive next-token prediction (AR-NTP) is equivalent to full-item-vocabulary maximum likelihood estimation (FV-MLE).
Equivalence holds under a bijective mapping between items and their corresponding k-token sequences.
Demonstrates this equivalence for both cascaded and parallel tokenization schemes used in industrial GR systems.
Offers the first formal theoretical foundation for the dominant industrial generative recommendation paradigm.

Why it matters

This paper provides the first formal theoretical foundation for the widely adopted auto-regressive next-token prediction paradigm in generative recommendation. It offers principled guidance for optimizing future GR systems by clarifying their underlying mathematical mechanism.

Original Abstract

Generative recommendation (GR) has emerged as a widely adopted paradigm in industrial sequential recommendation. Current GR systems follow a similar pipeline: tokenization for item indexing, next-token prediction as the training objective and auto-regressive decoding for next-item generation. However, existing GR research mainly focuses on architecture design and empirical performance optimization, with few rigorous theoretical explanations for the working mechanism of auto-regressive next-token prediction in recommendation scenarios. In this work, we formally prove that \textbf{the k-token auto-regressive next-token prediction (AR-NTP) paradigm is strictly mathematically equivalent to full-item-vocabulary maximum likelihood estimation (FV-MLE)}, under the core premise of a bijective mapping between items and their corresponding k-token sequences. We further show that this equivalence holds for both cascaded and parallel tokenizations, the two most widely used schemes in industrial GR systems. Our result provides the first formal theoretical foundation for the dominant industrial GR paradigm, and offers principled guidance for future GR system optimization.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers