TwiSTAR:Think Fast, Think Slow, Then Act,Generative Recommendation with Adaptive Reasoning

May 12, 20262605.11553

Shiteng Cao, Kaian Jiang, Yunlong Gong, Zhiheng Li

cs.IR

TLDR

TwiSTAR introduces an adaptive reasoning framework for generative recommendation, balancing speed and accuracy by dynamically selecting inference strategies.

Key contributions

Proposes TwiSTAR, an adaptive framework for generative recommendation with Semantic IDs.
Employs a planner to dynamically invoke fast retrieval, a lightweight ranker, or slow reasoning.
Slow reasoning model generates explicit rationales and integrates collaborative commonsense knowledge.
Achieves consistent accuracy gains and reduced inference latency over uniform slow reasoning.

Why it matters

Existing generative recommenders struggle with balancing speed and accuracy across diverse user histories. TwiSTAR solves this by adaptively applying reasoning, improving performance on hard cases while saving computation on easy ones. This leads to more efficient and effective recommendation systems.

Original Abstract

Generative recommendation with Semantic IDs (SIDs) has emerged as a promising paradigm, yet existing methods apply a fixed inference strategy, either fast direct generation or slow chain-of-thought reasoning, uniformly across all user histories. This approach creates a trade-off: fast recommendation model produces suboptimal accuracy on hard samples, while always invoking slow reasoning incurs prohibitive latency and wastes computation on easy cases. To address this, we propose Think Fast, Think Slow, Then Act, a framework that learns to adaptively allocate reasoning effort per user sequence. Our system equips an LLM with three complementary tools: a fast SID-based retriever, a lightweight candidate ranker, and a slow reasoning model that generates explicit rationales before recommending. Crucially, we inject collaborative commonsense into the slow model by transforming item-to-item knowledge into natural language explanations. A planner, trained through supervised warm-up followed by agentic reinforcement learning, dynamically decides which tool to invoke. Experiments on three datasets demonstrate that our method outperforms strong baselines, achieving consistent accuracy gains while reducing inference latency compared to uniform slow reasoning.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers