Offline Local Search for Online Stochastic Bandits

April 10, 20262604.09423

Gerdus Benadè, Rathish Das, Thomas Lavastida

cs.LG

TLDR

This paper introduces a generic framework to convert offline local search algorithms into online stochastic bandit algorithms, achieving O(log^3 T) approximate regret.

Key contributions

Proposes a novel framework to adapt offline local search for online stochastic combinatorial bandits.
Achieves significantly improved O(log^3 T) approximate regret, outperforming existing polynomial bounds.
Demonstrates framework's versatility across scheduling, matroid base, and uncertain clustering problems.

Why it matters

This work significantly advances online stochastic bandit algorithms by integrating powerful offline local search methods. It offers a substantial improvement in regret bounds, paving the way for more efficient solutions to complex online optimization problems.

Original Abstract

Combinatorial multi-armed bandits provide a fundamental online decision-making environment where a decision-maker interacts with an environment across $T$ time steps, each time selecting an action and learning the cost of that action. The goal is to minimize regret, defined as the loss compared to the optimal fixed action in hindsight under full-information. There has been substantial interest in leveraging what is known about offline algorithm design in this online setting. Offline greedy and linear optimization algorithms (both exact and approximate) have been shown to provide useful guarantees when deployed online. We investigate local search methods, a broad class of algorithms used widely in both theory and practice, which have thus far been under-explored in this context. We focus on problems where offline local search terminates in an approximately optimal solution and give a generic method for converting such an offline algorithm into an online stochastic combinatorial bandit algorithm with $O(\log^3 T)$ (approximate) regret. In contrast, existing offline-to-online frameworks yield regret (and approximate regret) which depend sub-linearly, but polynomially on $T$. We demonstrate the flexibility of our framework by applying it to three online stochastic combinatorial optimization problems: scheduling to minimize total completion time, finding a minimum cost base of a matroid and uncertain clustering.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers