Penalty-Based First-Order Methods for Bilevel Optimization with Minimax and Constrained Lower-Level Problems

May 8, 20262605.08006

Yiyang Shen, Yutian He, Weiran Wang, Qihang Lin

math.OCcs.LGstat.ML

TLDR

This paper introduces penalty-based first-order methods for bilevel optimization with minimax lower-level problems, achieving improved complexity bounds.

Key contributions

Develops penalty-based first-order methods for bilevel minimax optimization without strong convexity.
Achieves $\tilde{O}(\varepsilon^{-4})$ oracle complexity for deterministic settings to find an $\varepsilon$-KKT point.
Improves complexity to $\tilde{O}(\varepsilon^{-4})$ for constrained lower-level problems, beating $\tilde{O}(\varepsilon^{-7})$.
Extends to stochastic settings, finding a nearly $\varepsilon$-KKT point with $\tilde{O}(\varepsilon^{-9})$ complexity.

Why it matters

This work addresses a critical gap in bilevel optimization by tackling minimax lower-level problems, which are common in emerging applications. It provides novel, more efficient algorithms with improved theoretical guarantees, advancing the field significantly.

Original Abstract

We study a class of bilevel optimization problems in which both the upper- and lower-level problems have minimax structures. This setting captures a broad range of emerging applications. Despite the extensive literature on bilevel optimization and minimax optimization separately, existing methods mainly focus on bilevel optimization with lower-level minimization problems, often under strong convexity assumptions, and are not directly applicable to the minimax lower-level setting considered here. To address this gap, we develop penalty-based first-order methods for bilevel minimax optimization without requiring strong convexity of the lower-level problem. In the deterministic setting, we establish that the proposed method finds an $ε$-KKT point with $\tilde{O}(ε^{-4})$ oracle complexity. We further show that bilevel problems with convex constrained lower-level minimization can be reformulated as special cases of our framework via Lagrangian duality, leading to an $\tilde{O}(ε^{-4})$ complexity bound that improves upon the existing $\tilde{O}(ε^{-7})$ result. Finally, we extend our approach to the stochastic setting, where only stochastic gradient oracles are available, and prove that the proposed stochastic method finds a nearly $ε$-KKT point with $\tilde{O}(ε^{-9})$ oracle complexity.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers