Break the Inaccessible Boundary: Distilling Post-Conversion Content for User Retention Modeling
Tianbao Ma, Ruochen Yang, Chengen Li, Yuexin Shi, Jiangxia Cao + 5 more
TLDR
OCARM uses a two-stage distillation framework to leverage post-conversion content for improved user retention prediction in real-time bidding without feature leakage.
Key contributions
- Proposes OCARM, a two-stage distillation framework for retention modeling in real-time bidding.
- Addresses feature leakage by implicitly capturing future onboarding content signals.
- Stage 1: Trains a hierarchical teacher encoder using post-conversion content.
- Stage 2: Distills teacher knowledge into a user encoder using only observable features.
Why it matters
Predicting user retention before conversion is crucial for re-engagement in advertising, but using future content causes leakage. OCARM provides a novel, practical solution to incorporate these powerful signals, leading to significant improvements in real-world growth scenarios.
Original Abstract
User retention is a key metric to measure long-term engagement in modern platforms. In real-time bidding (RTB) advertising system for user re-engagement, the retention model is required to predict future revisit probability at bidding time, before the user converts and consumes any content. Although post-conversion content, termed Onboarding Content, provides highly informative signals for retention prediction, directly using it in training causes severe feature leakage and creates a gap between training and serving. To address this issue, we propose OCARM, a two-stage distillation-aligned framework for Onboarding Content Augmented Retention Modeling, enabling the model to implicitly capture future content using only observable features during inference. In the first stage, we deliberately expose onboarding content to train a hierarchical encoder that produces teacher representations. In the second stage, a user encoder is aligned with the frozen teacher through distillation, allowing the model to approximate the inaccessible onboarding signals without leakage. Extensive offline experiments and online A/B tests demonstrate that our framework achieves consistent improvements in a real-world growth scenario.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.