ArXiv TLDR

CS3: Efficient Online Capability Synergy for Two-Tower Recommendation

🐦 Tweet
2604.19269

Lixiang Wang, Shaoyun Shi, Peng Wang, Wenjin Wu, Peng Jiang

cs.IR

TLDR

CS3 is an efficient online framework that enhances two-tower recommender systems by improving representation, alignment, and cross-feature interactions.

Key contributions

  • Cycle-Adaptive Structure for self-revision via adaptive feature denoising within each tower.
  • Cross-Tower Synchronization to improve alignment through lightweight mutual awareness between towers.
  • Cascade-Model Sharing to enhance cross-stage consistency by reusing knowledge from downstream models.

Why it matters

Two-tower recommenders are efficient but limited. CS3 addresses these limitations with novel mechanisms, significantly boosting performance (up to 8.36% revenue) while maintaining real-time online deployment. This makes it a practical and powerful upgrade for large-scale systems.

Original Abstract

To balance effectiveness and efficiency in recommender systems, multi-stage pipelines commonly use lightweight two-tower models for large-scale candidate retrieval. However, the isolated two-tower architecture restricts representation capacity, embedding-space alignment, and cross-feature interactions. Existing solutions such as late interaction and knowledge distillation can mitigate these issues, but often increase latency or are difficult to deploy in online learning settings. We propose Capability Synergy (CS3), an efficient online framework that strengthens two-tower retrievers while preserving real-time constraints. CS3 introduces three mechanisms: (1) Cycle-Adaptive Structure for self-revision via adaptive feature denoising within each tower; (2) Cross-Tower Synchronization to improve alignment through lightweight mutual awareness between towers; and (3) Cascade-Model Sharing to enhance cross-stage consistency by reusing knowledge from downstream models. CS3 is plug-and-play with diverse two-tower backbones and compatible with online learning. Experiments on three public datasets show consistent gains over strong baselines, and deployment in a largescale advertising system yields up to 8.36% revenue improvement across three scenarios while maintaining ms-level latency.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.