Best of both worlds: Stochastic & adversarial best-arm identification

April 16, 20262604.14860

Yasin Abbasi-Yadkori, Peter L. Bartlett, Victor Gabillon, Alan Malek, Michal Valko

stat.MLcs.LG

TLDR

This paper introduces an algorithm for best-arm identification that performs optimally in stochastic bandit problems while remaining robust to adversarial rewards.

Key contributions

Proves a single learner cannot be optimal for both stochastic and adversarial bandits generally.
Establishes a lower bound for stochastic problem rates under adversarial robustness constraints.
Introduces a simple, parameter-free algorithm matching the stochastic lower bound (up to log factors).
Demonstrates the algorithm's robustness against adversarial reward scenarios.

Why it matters

This research addresses a fundamental challenge in bandit problems by showing the inherent trade-off between optimal performance in stochastic settings and robustness to adversarial ones. It provides a practical algorithm that achieves near-optimal performance across both, offering a significant step towards more versatile bandit solutions.

Original Abstract

We study bandit best-arm identification with arbitrary and potentially adversarial rewards. A simple random uniform learner obtains the optimal rate of error in the adversarial scenario. However, this type of strategy is suboptimal when the rewards are sampled stochastically. Therefore, we ask: Can we design a learner that performs optimally in both the stochastic and adversarial problems while not being aware of the nature of the rewards? First, we show that designing such a learner is impossible in general. In particular, to be robust to adversarial rewards, we can only guarantee optimal rates of error on a subset of the stochastic problems. We give a lower bound that characterizes the optimal rate in stochastic problems if the strategy is constrained to be robust to adversarial rewards. Finally, we design a simple parameter-free algorithm and show that its probability of error matches (up to log factors) the lower bound in stochastic problems, and it is also robust to adversarial ones.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers