Expressiveness Limits of Autoregressive Semantic ID Generation in Generative Recommendation
Yupeng Hou, Haven Kim, Clark Mingxuan Ju, Eduardo Escoto, Neil Shah + 1 more
TLDR
Generative recommendation models' autoregressive ID generation limits expressiveness due to tree-structured decoding, which Latte mitigates for better performance.
Key contributions
- Identifies that GR models' tree-structured semantic ID generation limits expressiveness and item differentiation.
- Theoretically proves these structural correlations prevent GR models from capturing simple user preference patterns.
- Introduces Latte, a method injecting latent tokens to create multiple decoding trees, relaxing probability coupling.
- Latte achieves a 3.45% relative improvement in NDCG@10, enhancing generative recommendation performance.
Why it matters
This paper uncovers a fundamental expressiveness limitation in generative recommendation models, stemming from their tree-structured item ID generation. It theoretically shows how this structure hinders capturing user-specific preferences. The proposed Latte method effectively reshapes the decoding space, yielding a 3.45% NDCG@10 improvement.
Original Abstract
Generative recommendation (GR) models generate items by autoregressively producing a sequence of discrete tokens that jointly index the target item. However, this autoregressive generation process also induces a structured decoding space whose impact on model expressiveness remains underexplored. Specifically, token-by-token generation can be viewed as traversing a decoding tree induced by semantic ID tokens, where leaf nodes correspond to candidate items. We observe that the item probabilities produced by GR models are strongly correlated with this tree structure: items that are close in the tree tend to receive similar probabilities for any given user, making it difficult to distinguish among them based on user-specific preferences. We further show theoretically that such structural correlations prevent GR models from representing even simple patterns that can be well captured by conventional collaborative filtering models. To mitigate this issue, we propose Latte, a simple modification that injects a latent token before each semantic ID, reshaping the decoding space from a single tree into multiple latent-token-conditioned trees. This design creates multiple paths with varying tree distances between items, relaxing tree-induced probability coupling and yielding an average of 3.45% relative improvement on NDCG@10. Our code is available at https://github.com/hyp1231/Latte.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.