ArXiv TLDR

Stochastic Scaling Limits and Synchronization by Noise in Deep Transformer Models

🐦 Tweet
2604.26898

Andrea Agazzi, Giuseppe Bruno, Eloy Mosig García, Samuele Saviozzi, Marco Romito

math.PRcs.LGstat.ML

TLDR

This paper proves that deep transformer token evolution converges to a stochastic interacting particle system, showing synchronization by noise.

Key contributions

  • Shows layerwise token evolution in transformers converges to a continuous-time stochastic interacting particle system.
  • Identifies the SPDE for token distribution and proves propagation of chaos for large numbers of tokens.
  • Demonstrates synchronization by noise in the limiting stochastic model and exponential dissipation of interaction energy.
  • Characterizes activation functions that enable noise-induced synchronization and energy dissipation.

Why it matters

This work provides a rigorous mathematical framework for understanding the dynamics of deep transformer models. By showing convergence to stochastic systems and identifying synchronization by noise, it offers new insights into how these complex models process information. This could lead to more stable and efficient transformer architectures.

Original Abstract

We prove pathwise convergence of the layerwise evolution of tokens in a finite-depth, finite-width transformer model with MultiLayer Perceptron (MLP) blocks to a continuous-time stochastic interacting particle system. We also identify the stochastic partial differential equation describing the evolution of the tokens' distribution in this limit and prove propagation of chaos when the number of such tokens is large. The bounds we establish are quantitative and the limits we consider commute. We further prove that the limiting stochastic model displays synchronization by noise and establish exponential dissipation of the interaction energy on average, provided that the common noise is sufficiently coercive relative to the deterministic self-attention drift. We finally characterize the activation functions satisfying the former condition.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.