ArXiv TLDR

Benefits and Costs of Adaptive Sampling

🐦 Tweet
2604.24652

Yu-Shiou Willy Lin, Dae Woong Ham, Iavor Bojinov

stat.MEecon.EM

TLDR

This paper shows adaptive sampling improves estimation precision over uniform designs and proposes policies balancing inference with online experimentation costs.

Key contributions

  • Adaptive Neyman allocation improves estimation precision (MSE) over uniform sampling with variance heterogeneity.
  • Introduces SARP and NARP policies for a joint inference-regret objective in multi-armed bandits.
  • SARP/NARP converge to optimal rates and interpolate between inference- and regret-oriented policies.

Why it matters

This paper addresses a fundamental question in multi-armed bandits: when does adaptive sampling truly improve estimation. It provides theoretical guarantees for precision gains and introduces novel policies that practically balance inference needs with the online costs of experimentation, offering a flexible approach for practitioners.

Original Abstract

Multi-armed bandits are widely used for sequential experimentation in clinical trials, recommendation systems, and online platforms. While regret minimization and valid inference from adaptively collected data have each been studied extensively, a basic question remains: when does adaptivity \emph{improve estimation precision} relative to uniform designs, and how should inference be balanced against the online cost of experimentation? We first study arm-level mean estimation under mean-squared-error (MSE) objectives. We characterize when an adaptive Neyman allocation, which allocates samples according to arm variance, yields strict MSE improvements over uniform sampling. When there is variance heterogeneity across arms, these improvements arise at modest sample sizes, clarifying that adaptivity can be preferable for inference not only asymptotically, but also in many practical finite-sample settings. We then study a joint inference-regret objective that accounts for the cost of assigning units to inferior arms during experimentation. We propose the Static-Allocation Rate Policy (SARP) and Neyman-Adaptive Rate Policy (NARP), which interpolates between inference- and regret-oriented policies by adjusting exploration to the local structure of the instance. We show that SARP and NARP converge to the complete-information benchmark at the optimal rate as the sampling budget grows. Our proposed policies are practically attractive as it linearly interpolates between any standard regret-minimizing algorithm and inference-targeting adaptive policies. Yet we show it still enjoys the oracle-based asymptotic optimal rate. Simulations support the theory by demonstrating improved precision over uniform allocation while controlling performance loss across a range of instances.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.