ArXiv TLDR

Best of both worlds: Stochastic & adversarial best-arm identification

🐦 Tweet
2604.14860

Yasin Abbasi-Yadkori, Peter L. Bartlett, Victor Gabillon, Alan Malek, Michal Valko

stat.MLcs.LG

TLDR

This paper introduces an algorithm for best-arm identification that performs optimally in stochastic bandit problems while remaining robust to adversarial rewards.

Key contributions

  • Proves a single learner cannot be optimal for both stochastic and adversarial bandits generally.
  • Establishes a lower bound for stochastic problem rates under adversarial robustness constraints.
  • Introduces a simple, parameter-free algorithm matching the stochastic lower bound (up to log factors).
  • Demonstrates the algorithm's robustness against adversarial reward scenarios.

Why it matters

This research addresses a fundamental challenge in bandit problems by showing the inherent trade-off between optimal performance in stochastic settings and robustness to adversarial ones. It provides a practical algorithm that achieves near-optimal performance across both, offering a significant step towards more versatile bandit solutions.

Original Abstract

We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is suboptimal when the rewards are sampled stochastically. Therefore, we ask: Can we design a learner that performs optimally in both the stochastic and adversarial problems while not being aware of the nature of the rewards? First, we show that designing such a learner is impossible in general. In particular, to be robust to adversarial rewards, we can only guarantee optimal rates of error on a subset of the stochastic problems. We give a lower bound that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards. Finally, we design a simple parameter-free algorithm and show that its probability of error matches (up to log factors) the lower bound in stochastic problems, and it is also robust to adversarial ones.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.