Pierre Menard

4 papers · Latest: April 23, 2026

A single algorithm for both restless and rested rotting bandits

RAW-UCB is a novel algorithm that achieves near-optimal regret in both restless and rested rotting bandit settings, unifying previously distinct problems.

2604.21432Apr 23, 2026

Machine Learning

Planning in entropy-regularized Markov decision processes and games

SmoothCruiser is a new planning algorithm for entropy-regularized MDPs and games, achieving O~(1/epsilon^4) sample complexity.

2604.19695Apr 21, 2026

Machine Learning

The Harder Path: Last Iterate Convergence for Uncoupled Learning in Zero-Sum Games with Bandit Feedback

New algorithms achieve optimal last-iterate convergence rates for uncoupled learning in zero-sum games with bandit feedback, despite inherent challenges.

2604.16087Apr 17, 2026

Machine Learning

Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier

This paper uses log-barrier regularization to achieve optimal O-tilde(t^{-1/4}) last-iterate convergence in zero-sum matrix games.

2604.15242Apr 16, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.