Penalty-Based First-Order Methods for Bilevel Optimization with Minimax and Constrained Lower-Level Problems
Yiyang Shen, Yutian He, Weiran Wang, Qihang Lin
TLDR
This paper introduces penalty-based first-order methods for bilevel optimization with minimax lower-level problems, achieving improved complexity bounds.
Key contributions
- Develops penalty-based first-order methods for bilevel minimax optimization without strong convexity.
- Achieves $\tilde{O}(\varepsilon^{-4})$ oracle complexity for deterministic settings to find an $\varepsilon$-KKT point.
- Improves complexity to $\tilde{O}(\varepsilon^{-4})$ for constrained lower-level problems, beating $\tilde{O}(\varepsilon^{-7})$.
- Extends to stochastic settings, finding a nearly $\varepsilon$-KKT point with $\tilde{O}(\varepsilon^{-9})$ complexity.
Why it matters
This work addresses a critical gap in bilevel optimization by tackling minimax lower-level problems, which are common in emerging applications. It provides novel, more efficient algorithms with improved theoretical guarantees, advancing the field significantly.
Original Abstract
We study a class of bilevel optimization problems in which both the upper- and lower-level problems have minimax structures. This setting captures a broad range of emerging applications. Despite the extensive literature on bilevel optimization and minimax optimization separately, existing methods mainly focus on bilevel optimization with lower-level minimization problems, often under strong convexity assumptions, and are not directly applicable to the minimax lower-level setting considered here. To address this gap, we develop penalty-based first-order methods for bilevel minimax optimization without requiring strong convexity of the lower-level problem. In the deterministic setting, we establish that the proposed method finds an $ε$-KKT point with $\tilde{O}(ε^{-4})$ oracle complexity. We further show that bilevel problems with convex constrained lower-level minimization can be reformulated as special cases of our framework via Lagrangian duality, leading to an $\tilde{O}(ε^{-4})$ complexity bound that improves upon the existing $\tilde{O}(ε^{-7})$ result. Finally, we extend our approach to the stochastic setting, where only stochastic gradient oracles are available, and prove that the proposed stochastic method finds a nearly $ε$-KKT point with $\tilde{O}(ε^{-9})$ oracle complexity.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.