ArXiv TLDR

Optimal Posterior Sampling for Policy Identification in Tabular Markov Decision Processes

🐦 Tweet
2605.03921

Cyrille Kone, Kevin Jamieson

cs.LGstat.ML

TLDR

An asymptotically optimal and computationally efficient posterior sampling algorithm is proposed for policy identification in tabular MDPs, overcoming prior limitations.

Key contributions

  • Proposes a randomized, computationally efficient algorithm for best policy identification in MDPs.
  • Achieves asymptotic optimality in sample complexity and posterior contraction rates.
  • Runs in O(S^2AH) per episode, matching standard model-based approaches.
  • Avoids suboptimal log(1/δ) dependence, improving on prior methods like MOCA and PEDEL.

Why it matters

This paper addresses critical limitations in existing policy identification methods for MDPs, which suffer from high computational cost and suboptimal guarantees. The new algorithm offers asymptotically optimal sample complexity and efficiency, providing both theoretical advancements and practical tools.

Original Abstract

We study the $(\varepsilon, δ)$-PAC policy identification problem in finite-horizon episodic Markov Decision Processes. Existing approaches provide finite-time guarantees for approximate settings ($\varepsilon>0$) but suffer from high computational cost, rendering them hard to implement, and also suffer from suboptimal dependence on $\log(1/δ)$. We propose a randomized and computationally efficient algorithm for best policy identification that combines posterior sampling with an online learning algorithm to guide exploration in the MDP. Our method achieves asymptotic optimality in sample complexity, also in terms of posterior contraction rate, and runs in $O(S^2AH)$ per episode, matching standard model-based approaches. Unlike prior algorithms such as MOCA and PEDEL, our guarantees remain meaningful in the asymptotic regime and avoid sub-optimal polynomial dependence on $\log(1/δ)$. Our results provide both theoretical insights and practical tools for efficient policy identification in tabular MDPs.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.