Efficient Multi-Cohort Inference for Long-Term Effects and Lifetime Value in A/B Testing with User Learning

April 22, 20262604.20777

Dario Simionato, Andrea Tonon, Mingxue Wang, Weiguo Wang, Tong Gui + 1 more

cs.LG

TLDR

This paper introduces an efficient multi-cohort inference method to estimate long-term treatment effects and lifetime value in A/B tests, improving decision-making.

Key contributions

Estimates long-term treatment effects (LTE) and residual lifetime value ($ΔERLV$) in short multi-cohort A/B tests.
Introduces an inverse-variance weighted estimator for efficient, low-variance time-varying treatment effect estimation.
Models treatment trajectory via parametric decay to recover asymptotic effects and cumulative value over time.
Enables simultaneous evaluation of steady-state impact and residual user value within a single experiment.

Why it matters

Traditional A/B tests often fail to capture long-term user churn and lifetime value, leading to suboptimal product decisions. This framework offers a more precise way to estimate long-term effects and residual user value, preventing costly errors. It's crucial for platforms where churn significantly impacts total value.

Original Abstract

In streaming platforms churn is extremely costly, yet A/B tests are typically evaluated using outcomes observed within a limited experimental horizon. Even when both short- and predicted long-term engagement metrics are considered, they may fail to capture how a treatment affects users' retention. Consequently, an intervention may appear beneficial in the short term and neutral in the long term while still generating lower total value than the control due to users churn. To address this limitation, we introduce a method that estimates long-term treatment effects (LTE) and residual lifetime value change ($ΔERLV$) in short multi-cohort A/B tests under user learning. To estimate time-varying treatment effects efficiently, we introduce an inverse-variance weighted estimator that combines multiple cohorts estimates, reducing variance relative to standard approaches in the literature. The estimated treatment trajectory is then modeled as a parametric decay to recover both the asymptotic treatment effect and the cumulative value generated over time. Our framework enables simultaneous evaluation of steady-state impact and residual user value within a single experiment. Empirical results show improved precision in estimating LTE and $ΔERLV$ and identify scenarios in which relying on either short-term or long-term metrics alone would lead to incorrect product decisions.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers