Tight Sample Complexity Bounds for Best-Arm Identification Under Bounded Systematic Bias

April 15, 20262604.14345

cs.LGcs.AIstat.ML

TLDR

This paper provides tight sample complexity bounds for Best-Arm Identification in biased search, showing safe pruning requires reward gaps to exceed 4L.

Key contributions

Frames node expansion in reasoning as a localized Best-Arm Identification (BAI) problem with bounded systematic bias (L).
Establishes an upper bound for sample complexity: O((Δ-4L)^-2), showing safe pruning needs reward gap Δ > 4L.
Provides an information-theoretic lower bound: Ω((Δ-2L)^-2), confirming fundamental limits of biased search.
Demonstrates that adhering to these safety bounds preserves optimal trajectories and improves sample efficiency.

Why it matters

This work addresses a critical problem in autonomous reasoning: safe pruning with biased models like LLMs. It provides formal guarantees and limits for decision-making under uncertainty and bias. This offers a principled approach to safely manage computational budgets in complex search spaces.

Original Abstract

As search depth increases in autonomous reasoning and embodied planning, the candidate action space expands exponentially, heavily taxing computational budgets. While heuristic pruning is a common countermeasure, it operates without formal safety guarantees when surrogate models (like LLMs) exhibit systematic evaluation biases. This paper frames the node expansion process as a localized Best-Arm Identification (BAI) problem over dynamic frontiers, subject to a bounded systematic bias $L$. By inverting the Lambert W function, we establish an additive sample complexity of $\mathcal{O}((Δ-4L)^{-2})$, which indicates that safe node elimination is only feasible when the empirical reward gap exceeds $4L$. We complement this with an information-theoretic lower bound of $Ω((Δ-2L)^{-2})$ to confirm the structural limits of biased search. Subsequent evaluations on both synthetic trees and complex reasoning tasks demonstrate that adhering to this local safety boundary successfully preserves optimal trajectories while maximizing sample allocation efficiency.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers