Break the Inaccessible Boundary: Distilling Post-Conversion Content for User Retention Modeling

April 28, 20262604.25839

Tianbao Ma, Ruochen Yang, Chengen Li, Yuexin Shi, Jiangxia Cao + 5 more

cs.IR

TLDR

OCARM uses a two-stage distillation framework to leverage post-conversion content for improved user retention prediction in real-time bidding without feature leakage.

Key contributions

Proposes OCARM, a two-stage distillation framework for retention modeling in real-time bidding.
Addresses feature leakage by implicitly capturing future onboarding content signals.
Stage 1: Trains a hierarchical teacher encoder using post-conversion content.
Stage 2: Distills teacher knowledge into a user encoder using only observable features.

Why it matters

Predicting user retention before conversion is crucial for re-engagement in advertising, but using future content causes leakage. OCARM provides a novel, practical solution to incorporate these powerful signals, leading to significant improvements in real-world growth scenarios.

Original Abstract

User retention is a key metric to measure long-term engagement in modern platforms. In real-time bidding (RTB) advertising system for user re-engagement, the retention model is required to predict future revisit probability at bidding time, before the user converts and consumes any content. Although post-conversion content, termed Onboarding Content, provides highly informative signals for retention prediction, directly using it in training causes severe feature leakage and creates a gap between training and serving. To address this issue, we propose OCARM, a two-stage distillation-aligned framework for Onboarding Content Augmented Retention Modeling, enabling the model to implicitly capture future content using only observable features during inference. In the first stage, we deliberately expose onboarding content to train a hierarchical encoder that produces teacher representations. In the second stage, a user encoder is aligned with the frozen teacher through distillation, allowing the model to approximate the inaccessible onboarding signals without leakage. Extensive offline experiments and online A/B tests demonstrate that our framework achieves consistent improvements in a real-world growth scenario.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers