ArXiv TLDR

Stochastic Trust-Region Methods for Over-parameterized Models

🐦 Tweet
2604.14017

Aike Yang, Hao Wang

math.OCcs.LG

TLDR

This paper introduces a stochastic trust-region framework for over-parameterized models, eliminating manual step-size tuning and handling constrained problems.

Key contributions

  • Proposes a unified stochastic trust-region framework for over-parameterized models.
  • Eliminates manual step-size tuning, improving stability of stochastic optimization methods.
  • Achieves O(ε⁻² log(1/ε)) complexity for unconstrained ε-stationary points.
  • Extends to equality-constrained problems, reaching O(ε)-approximate KKT points.

Why it matters

This framework addresses a critical challenge in stochastic optimization: the sensitivity to step-size selection. By automating this process, it makes complex models easier to train and more robust. Its ability to handle constraints also broadens the applicability of stochastic methods.

Original Abstract

Under interpolation-type assumptions such as the strong growth condition, stochastic optimization methods can attain convergence rates comparable to full-batch methods, but their performance, particularly for SGD, remains highly sensitive to step-size selection. To address this issue, we propose a unified stochastic trust-region framework that eliminates manual step-size tuning and extends naturally to equality-constrained problems. For unconstrained optimization, we develop a first-order stochastic trust-region algorithm and show that, under the strong growth condition, it achieves an iteration and stochastic first-order oracle complexity of $O(\varepsilon^{-2} \log(1/\varepsilon))$ for finding an $\varepsilon$-stationary point. For equality-constrained problems, we introduce a quadratic-penalty-based stochastic trust-region method with penalty parameter $μ$, and establish an iteration and oracle complexity of $O(\varepsilon^{-4} \log(1/\varepsilon))$ to reach an $\varepsilon$-stationary point of the penalized problem, corresponding to an $O(\varepsilon)$-approximate KKT point of the original constrained problem. Numerical experiments on deep neural network training and orthogonally constrained subspace fitting demonstrate that the proposed methods achieve performance comparable to well-tuned stochastic baselines, while exhibiting stable optimization behavior and effectively handling hard constraints without manual learning-rate scheduling.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.