Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier

April 16, 20262604.15242

Come Fiegel, Pierre Menard, Tadashi Kozuno, Michal Valko, Vianney Perchet

cs.LG

TLDR

This paper uses log-barrier regularization to achieve optimal O-tilde(t^{-1/4}) last-iterate convergence in zero-sum matrix games.

Key contributions

Achieves optimal O-tilde(t^{-1/4}) last-iterate convergence in zero-sum matrix games.
Employs log-barrier regularization and a novel dual-focused analysis.
Matches the theoretical lower bound for uncoupled players with high probability.
Extends the approach to extensive-form games, maintaining the optimal convergence rate.

Why it matters

This paper solves a long-standing problem by achieving optimal last-iterate convergence in uncoupled zero-sum matrix games. It matches a theoretical lower bound, significantly improving the efficiency of learning minimax policies in competitive environments.

Original Abstract

We study the problem of learning minimax policies in zero-sum matrix games. Fiegel et al. (2025) recently showed that achieving last-iterate convergence in this setting is harder when the players are uncoupled, by proving a lower bound on the exploitability gap of Omega(t^{-1/4}). Some online mirror descent algorithms were proposed in the literature for this problem, but none have truly attained this rate yet. We show that the use of a log-barrier regularization, along with a dual-focused analysis, allows this O-tilde(t^{-1/4}) convergence with high-probability. We additionally extend our idea to the setting of extensive-form games, proving a bound with the same rate.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers