Mitigating Collaborative Semantic ID Staleness in Generative Retrieval
Vladimir Baikalov, Iskander Bagautdinov, Sergey Muravyov
TLDR
A new method mitigates Semantic ID staleness in generative retrieval, boosting performance and significantly cutting retraining compute.
Key contributions
- Analyzes Semantic ID (SID) staleness in generative retrieval under strict chronological evaluation.
- Proposes a lightweight, model-agnostic SID alignment update to refresh SIDs without full retraining.
- Aligns refreshed SIDs to existing vocabulary, enabling warm-start fine-tuning and checkpoint compatibility.
- Achieves significant improvements in Recall@K/nDCG@K and reduces training compute by 8-9x.
Why it matters
SID staleness is a critical, often overlooked, problem in generative retrieval systems that rely on dynamic user interaction data. This work offers a practical, computationally efficient solution to maintain model performance over time without costly full retraining, making generative retrieval more adaptable and sustainable.
Original Abstract
Generative retrieval with Semantic IDs (SIDs) assigns each item a discrete identifier and treats retrieval as a sequence generation problem rather than a nearest-neighbor search. While content-only SIDs are stable, they do not take into account user-item interaction patterns, so recent systems construct interaction-informed SIDs. However, as interaction patterns drift over time, these identifiers become stale, i.e., their collaborative semantics no longer match recent logs. Prior work typically assumes a fixed SID vocabulary during fine-tuning, or treats SID refresh as a full rebuild that requires retraining. However, SID staleness under temporal drift is rarely analyzed explicitly. To bridge this gap, we study SID staleness under strict chronological evaluation and propose a lightweight, model-agnostic SID alignment update. Given refreshed SIDs derived from recent logs, we align them to the existing SID vocabulary so the retriever checkpoint remains compatible, enabling standard warm-start fine-tuning without a full rebuild-and-retrain pipeline. Across three public benchmarks, our update consistently improves Recall@K and nDCG@K at high cutoffs over naive fine-tuning with stale SIDs and reduces retriever-training compute by approximately 8-9 times compared to full retraining.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.