Sharp description of local minima in the loss landscape of high-dimensional two-layer ReLU neural networks

April 10, 20262604.09412

Jie Huang, Bruno Loureiro, Stefano Sarao Mannelli

stat.MLcond-mat.dis-nncs.LG

TLDR

This paper sharply describes local minima in two-layer ReLU networks' loss landscape, linking them to SGD dynamics and revealing a hierarchical structure.

Key contributions

Local minima in 2-layer ReLU networks have a sharp, low-dimensional representation.
Links local minima to attractive fixed points of one-pass SGD dynamics.
Reveals hierarchical minima structure: isolated in well-specified, connected in overparameterized.
Global minima become more accessible in overparameterized networks, reducing spurious solutions.

Why it matters

This paper provides a sharp, interpretable characterization of local minima in ReLU networks. It reveals how common simplifying assumptions can miss essential features of the loss landscape, offering a deeper understanding of optimization dynamics.

Original Abstract

We study the population loss landscape of two-layer ReLU networks of the form $\sum_{k=1}^K \mathrm{ReLU}(w_k^\top x)$ in a realisable teacher-student setting with Gaussian covariates. We show that local minima admit an exact low-dimensional representation in terms of summary statistics, yielding a sharp and interpretable characterisation of the landscape. We further establish a direct link with one-pass SGD: local minima correspond to attractive fixed points of the dynamics in summary statistics space. This perspective reveals a hierarchical structure of minima: they are typically isolated in the well-specified regime, but become connected by flat directions as network width increases. In this overparameterised regime, global minima become increasingly accessible, attracting the dynamics and reducing convergence to spurious solutions. Overall, our results reveal intrinsic limitations of common simplifying assumptions, which may miss essential features of the loss landscape even in minimal neural network models.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers