LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation
Yiwen Chen, Fuwei Zhang, Zehao Chen, Deqing Wang, Hehan Li + 6 more
TLDR
LASAR enables efficient, high-quality generative recommendation by using latent adaptive semantic aligned reasoning, significantly faster than explicit Chain-of-Thought.
Key contributions
- Bridges the gap between Semantic IDs and latent reasoning through a novel two-stage training framework.
- Mitigates representation drift via explicit Chain-of-Thought semantic alignment using hidden-state anchors.
- Dynamically allocates reasoning steps per sample with a Policy Head, optimizing efficiency and recommendation quality.
Why it matters
LLMs offer powerful reasoning but are too slow for real-time recommender systems. LASAR makes latent reasoning practical for generative recommendation by solving key challenges in semantic alignment and adaptive depth. This significantly boosts efficiency and quality, enabling LLM reasoning for latency-sensitive applications.
Original Abstract
Large Language Models (LLMs) have demonstrated powerful reasoning capabilities through Chain-of-Thought (CoT) in various tasks, yet the inefficiency of token-by-token generation hinders real-world deployment in latency-sensitive recommender systems. Latent reasoning has emerged as an effective paradigm in LLMs, performing multi-step inference in a continuous hidden-state space to achieve stronger reasoning at lower cost. However, this paradigm remains underexplored in mainstream generative recommendation. Adapting it reveals three unique challenges: (1) the gap between prior-less Semantic ID (SID) symbols and continuous latent reasoning - SIDs lack pre-trained semantics, hindering joint optimization; (2) representation drift due to a lack of reasoning chain supervision; and (3) the suboptimality of applying a globally fixed reasoning depth. To address these, we propose LASAR (Latent Adaptive Semantic Aligned Reasoning), an SFT-then-RL framework. First, we bridge this gap via two-stage training: Stage 1 grounds SID semantics before Stage 2 introduces latent reasoning, ensuring efficient convergence. Second, we mitigate representation drift through explicit CoT semantic alignment. Step-wise bidirectional KL divergence constrains the latent reasoning trajectory using hidden-state anchors extracted from CoT text, while a Policy Head predicts per-sample reasoning depth. Third, during the GRPO-based RL phase, terminal-only KL alignment accommodates variable-length reasoning, and REINFORCE optimizes the Policy Head to dynamically allocate steps. This nearly halves the average latent step count while simultaneously improving recommendation quality. Experiments on three real-world datasets demonstrate that LASAR outperforms all baselines. It adds marginal inference latency and is roughly 20 times faster than generating explicit CoT text.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.