ArXiv TLDR

Planning in entropy-regularized Markov decision processes and games

🐦 Tweet
2604.19695

Jean-Bastien Grill, Omar Darwiche Domingues, Pierre Ménard, Rémi Munos, Michal Valko

cs.LG

TLDR

SmoothCruiser is a new planning algorithm for entropy-regularized MDPs and games, achieving O~(1/epsilon^4) sample complexity.

Key contributions

  • Introduces SmoothCruiser, a novel planning algorithm for value function estimation.
  • Designed for entropy-regularized Markov Decision Processes and two-player games.
  • Achieves problem-independent O~(1/epsilon^4) sample complexity for desired accuracy.
  • Leverages Bellman operator smoothness for guaranteed polynomial complexity.

Why it matters

This paper introduces an algorithm that overcomes a major challenge in planning. It provides guaranteed polynomial sample complexity for entropy-regularized settings, a significant improvement over non-regularized cases. This advancement could enable more robust and efficient planning in complex environments.

Original Abstract

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order O~(1/epsilon^4) for a desired accuracy epsilon, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.