ArXiv TLDR

Prior-Free Sample Size Design for Test-and-Roll Experiments

🐦 Tweet
2605.02414

Kentaro Kawato, Shosei Sakaguchi

econ.EMstat.ME

TLDR

This paper proposes the Worst-case Marginal Benefit (WMB) rule for test-and-roll experiments, yielding an optimal sample size of N/3.

Key contributions

  • Identifies limitations of the standard absolute minimax regret criterion for sample size design.
  • Introduces the Worst-case Marginal Benefit (WMB) rule for welfare-aware sample-size choice.
  • Establishes a "rule-of-thirds" benchmark, suggesting an optimal sample size of m ≈ N/3.
  • Demonstrates the N/3 benchmark for both Bernoulli and Gaussian outcomes under specific conditions.

Why it matters

This paper provides a practical, prior-free method for determining sample sizes in test-and-roll experiments, addressing a common challenge in A/B testing and sequential decision-making. The "rule-of-thirds" offers a simple, actionable guideline for practitioners.

Original Abstract

This paper studies sample-size design for finite-population test-and-roll experiments, where a decision-maker first conducts an experiment on $m$ units and then assigns the remaining $N-m$ units to the treatment that performs better in the experiment. We consider welfare-aware sample-size choice, which involves an exploration-exploitation tradeoff: larger experiments improve the rollout decision but impose welfare losses on experimental units assigned to the inferior treatment. We show that the standard absolute minimax regret criterion can lead to implausibly small experiments by over-penalizing exploration in its worst-case objective. To address this limitation, we propose the Worst-case Marginal Benefit (WMB) rule, which compares the worst-case marginal benefit of adding one more matched pair to the experiment with the corresponding marginal exploration cost. We establish a simple rule-of-thirds benchmark. For Bernoulli outcomes, after excluding pathological cases, the WMB criterion yields the optimal sample size of $m \approx N/3$ through a Gaussian approximation. For Gaussian outcomes with a known common variance, the same benchmark arises exactly. These results provide a prior-free and practically implementable guide for welfare-based sample-size design.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.