Analysis of Search Heuristics in the Multi-Armed Bandit Setting

April 9, 20262604.08109

Jasmin Brandt, Barbara Hammer, Timo Kötzing, Jurek Sander

cs.NE

TLDR

This paper analyzes search heuristics in Dueling Bandits, showing EAs struggle to find Condorcet winners and proposing improved methods.

Key contributions

Evolutionary algorithms (EAs) are poor at identifying Condorcet winners in Dueling Bandits.
(1+1) EA selects the Condorcet winner with only constant probability under specific conditions.
A simple EDA (Max-Min Ant System) identifies the Condorcet winner with much higher probability.
Repeated duels significantly boost the (1+1) EA's ability to find the Condorcet winner.

Why it matters

This work highlights critical limitations of common evolutionary algorithms in identifying optimal solutions within comparison-based settings. It offers valuable insights for designing more effective search heuristics and improving existing ones.

Original Abstract

We consider the classic Multi-Armed Bandit setting to understand the exploration/exploitation tradeoffs made by different search heuristics. Since many search heuristics work by comparing different options (in evolutionary algorithms called "individuals"; in the Bandit literature called "arms"), we work with the "Dueling Bandits" setting. In each iteration, a comparison between different arms can be made; in the binary stochastic setting, each arm has a fixed winning probability against any other arm. A Condorcet winner is any arm that beats every other arm with a probability strictly higher than $1/2$. We show that evolutionary algorithms are rather bad at identifying the Condorcet winner: Even if the Condorcet winner beats every other arm with a probability $1-p$, the (1+1) EA, in its stationary distribution, chooses the Condorcet winner only with constant probability if $p=Ω(1/n)$. By contrast, we show that a simple EDA (based on the Max-Min Ant System with iteration-best update) will choose the Condorcet winner in its maintained distribution with probability $1-Θ(p)$. As a remedy for the (1+1) EA, we show how repeated duels can significantly boost the probability of the Condorcet winner in the stationary distribution.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers