Free Decompression with Algebraic Spectral Curves
Siavash Ameli, Chris van der Heide, Liam Hodgkinson, Michael W. Mahoney
TLDR
This paper introduces a generalized Free Decompression method using algebraic spectral curves to extrapolate spectral information for large, realistic ML models.
Key contributions
- Generalizes Free Decompression (FD) using algebraic spectral curve theory to overcome limitations of prior methods.
- Recasts FD as an evolution along spectral curves, allowing integration for complex spectral densities.
- Handles multi-modal, multi-scale spectral densities with atoms, common in real-world ML data.
- Demonstrates efficacy on Hessian and activation matrices in neural networks and diffusion models.
Why it matters
This work addresses a critical limitation in applying random matrix theory to large-scale machine learning models. By generalizing Free Decompression, it enables more accurate spectral analysis of realistic neural networks and diffusion models. This advancement can lead to better understanding of generalization, robustness, and failure modes in deep learning.
Original Abstract
Tools from random matrix theory have become central to deep learning theory, using spectral information to provide mechanisms for modeling generalization, robustness, scaling, and failure modes. While often capable of modeling empirical behavior, practical computations are limited by matrix size, often imposing a restriction to models that are too small to be realistic. This motivates the inference of properties of larger models from the behavior of smaller ones. Free decompression (FD) is a recently proposed method for extrapolating spectral information across matrix sizes, but its utility is currently limited by strong assumptions that preclude its implementation on more realistic machine learning (ML) models. We use algebraic spectral curve theory to provide a general FD methodology for spectral densities whose Stieltjes transform satisfies an algebraic relation, a modeling assumption that is more likely to hold in practice. This recasts FD as an evolution along spectral curves which can be readily integrated. Our framework enables the expansion of spectral densities that have multiple or multi-modal bulks, that exist at multiple scales, and that contain atoms, all characteristic of real-world data and popular ML models. We demonstrate the efficacy of our framework on models of interest in modern ML, including Hessian and activation matrices associated with neural networks and large-scale diffusion models.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.