Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model
Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric
TLDR
This paper establishes sample complexity bounds for learning ε-optimal policies in Stochastic Shortest Path problems, revealing challenges when minimum costs are zero.
Key contributions
- Derives a lower bound of Ω(SAB_star^3 / (c_min ε^2)) for learning ε-optimal policies in SSP.
- Shows SSPs are harder to learn than finite-horizon/discounted settings, especially when minimum costs are zero.
- Proposes an algorithm that matches the derived lower bound, even when minimum costs are zero under certain conditions.
Why it matters
This research provides fundamental insights into the learnability of Stochastic Shortest Path problems, highlighting a critical challenge when minimum costs are zero. It offers both theoretical limits and practical algorithms, advancing reinforcement learning in complex environments.
Original Abstract
We study the sample complexity of learning an $ε$-optimal policy in the Stochastic Shortest Path (SSP) problem. We first derive sample complexity bounds when the learner has access to a generative model. We show that there exists a worst-case SSP instance with $S$ states, $A$ actions, minimum cost $c_{\min}$, and maximum expected cost of the optimal policy over all states $B_{\star}$, where any algorithm requires at least $Ω(SAB_{\star}^3/(c_{\min}ε^2))$ samples to return an $ε$-optimal policy with high probability. Surprisingly, this implies that whenever $c_{\min} = 0$ an SSP problem may not be learnable, thus revealing that learning in SSPs is strictly harder than in the finite-horizon and discounted settings. We complement this lower bound with an algorithm that matches it, up to logarithmic factors, in the general case, and an algorithm that matches it up to logarithmic factors even when $c_{\min} = 0$, but only under the condition that the optimal policy has a bounded hitting time to the goal state.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.