Offline Local Search for Online Stochastic Bandits
Gerdus Benadè, Rathish Das, Thomas Lavastida
TLDR
This paper introduces a generic framework to convert offline local search algorithms into online stochastic bandit algorithms, achieving O(log^3 T) approximate regret.
Key contributions
- Proposes a novel framework to adapt offline local search for online stochastic combinatorial bandits.
- Achieves significantly improved O(log^3 T) approximate regret, outperforming existing polynomial bounds.
- Demonstrates framework's versatility across scheduling, matroid base, and uncertain clustering problems.
Why it matters
This work significantly advances online stochastic bandit algorithms by integrating powerful offline local search methods. It offers a substantial improvement in regret bounds, paving the way for more efficient solutions to complex online optimization problems.
Original Abstract
Combinatorial multi-armed bandits provide a fundamental online decision-making environment where a decision-maker interacts with an environment across $T$ time steps, each time selecting an action and learning the cost of that action. The goal is to minimize regret, defined as the loss compared to the optimal fixed action in hindsight under full-information. There has been substantial interest in leveraging what is known about offline algorithm design in this online setting. Offline greedy and linear optimization algorithms (both exact and approximate) have been shown to provide useful guarantees when deployed online. We investigate local search methods, a broad class of algorithms used widely in both theory and practice, which have thus far been under-explored in this context. We focus on problems where offline local search terminates in an approximately optimal solution and give a generic method for converting such an offline algorithm into an online stochastic combinatorial bandit algorithm with $O(\log^3 T)$ (approximate) regret. In contrast, existing offline-to-online frameworks yield regret (and approximate regret) which depend sub-linearly, but polynomially on $T$. We demonstrate the flexibility of our framework by applying it to three online stochastic combinatorial optimization problems: scheduling to minimize total completion time, finding a minimum cost base of a matroid and uncertain clustering.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.