ArXiv TLDR

SID-Coord: Coordinating Semantic IDs for ID-based Ranking in Short-Video Search

🐦 Tweet
2604.10471

Guowen Li, Yuepeng Zhang, Shunyu Zhang, Yi Zhang, Xiaoze Jiang + 2 more

cs.IR

TLDR

SID-Coord enhances short-video search ranking by integrating trainable semantic IDs to balance memorization and generalization, improving long-tail item performance.

Key contributions

  • Introduces attention-based fusion over hierarchical SIDs to capture multi-level semantics.
  • Develops a target-aware HID-SID gating mechanism for adaptive memorization-generalization balance.
  • Implements a SID-driven interest alignment module to model semantic similarity.
  • Achieved +0.664% long-play rate and +0.369% playback duration in online A/B tests.

Why it matters

ID-based ranking struggles with long-tail items due to memorization-generalization trade-offs. SID-Coord addresses this by coordinating semantic IDs with existing HIDs, improving generalization. Online A/B tests show significant gains in long-play rate and playback duration in real-world short-video search.

Original Abstract

Large-scale short-video search ranking models are typically trained on sparse co-occurrence signals over hashed item identifiers (HIDs). While effective at memorizing frequent interactions, such ID-based models struggle to generalize to long-tailed items with limited exposure. This memorization-generalization trade-off remains a longstanding challenge in such industrial systems. We propose SID-Coord, a lightweight Semantic ID framework that incorporates discrete, trainable semantic IDs (SIDs) directly into ID-based ranking models. Instead of treating semantic signals as auxiliary dense features, SID-Coord represents semantics as structured identifiers and coordinates HID-based memorization with SID-based generalization within a unified modeling framework. To enable effective coordination, SID-Coord introduces three components: (1) an attention-based fusion module over hierarchical SIDs to capture multi-level semantics, (2) a target-aware HID-SID gating mechanism that adaptively balances memorization and generalization, and (3) a SID-driven interest alignment module that models the semantic similarity distribution between target items and user histories. SID-Coord can be integrated into existing production ranking systems without modifying the backbone model. Online A/B experiments in a real-world production environment show statistically significant improvements, with a +0.664% gain in long-play rate in search and a +0.369% increase in search playback duration.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.