Pierre Menard
4 papers ยท Latest:
Statistical Machine Learning
A single algorithm for both restless and rested rotting bandits
RAW-UCB is a novel algorithm that achieves near-optimal regret in both restless and rested rotting bandit settings, unifying previously distinct problems.
2604.21432
Machine LearningPlanning in entropy-regularized Markov decision processes and games
SmoothCruiser is a new planning algorithm for entropy-regularized MDPs and games, achieving O~(1/epsilon^4) sample complexity.
2604.19695
Machine LearningThe Harder Path: Last Iterate Convergence for Uncoupled Learning in Zero-Sum Games with Bandit Feedback
New algorithms achieve optimal last-iterate convergence rates for uncoupled learning in zero-sum games with bandit feedback, despite inherent challenges.
2604.16087
Machine LearningOptimal last-iterate convergence in matrix games with bandit feedback using the log-barrier
This paper uses log-barrier regularization to achieve optimal O-tilde(t^{-1/4}) last-iterate convergence in zero-sum matrix games.
2604.15242
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.