ArXiv TLDR

Geometric regularization of autoencoders via observed stochastic dynamics

🐦 Tweet
2604.16282

Sean Hill, Felix X. -F. Ye

cs.LGmath.DSmath.PR

TLDR

This paper introduces a geometrically regularized autoencoder pipeline that accurately learns low-dimensional stochastic dynamics from high-dimensional data.

Key contributions

  • Utilizes ambient covariance to extract coordinate-invariant tangent-space information for regularization.
  • Proposes tangent-bundle and inverse-consistency penalties for a three-stage autoencoder pipeline.
  • Derives an encoder-pullback target for drift, correcting systematic errors in standard decoder-side formulas.
  • Reduces mean first-passage time error by 50-70% and ambient coefficient errors by an order of magnitude.

Why it matters

Accurately modeling high-dimensional stochastic systems on low-dimensional manifolds is challenging. This work provides a robust autoencoder framework that leverages geometric constraints to significantly improve the accuracy of learned dynamics. This is crucial for building reliable reduced simulators.

Original Abstract

Stochastic dynamical systems with slow or metastable behavior evolve, on long time scales, on an unknown low-dimensional manifold in high-dimensional ambient space. Building a reduced simulator from short-burst ambient ensembles is a long-standing problem: local-chart methods like ATLAS suffer from exponential landmark scaling and per-step reprojection, while autoencoder alternatives leave tangent-bundle geometry poorly constrained, and the errors propagate into the learned drift and diffusion. We observe that the ambient covariance~$Λ$ already encodes coordinate-invariant tangent-space information, its range spanning the tangent bundle. Using this, we construct a tangent-bundle penalty and an inverse-consistency penalty for a three-stage pipeline (chart learning, latent drift, latent diffusion) that learns a single nonlinear chart and the latent SDE. The penalties induce a function-space metric, the $ρ$-metric, strictly weaker than the Sobolev $H^1$ norm yet achieving the same chart-quality generalization rate up to logarithmic factors. For the drift, we derive an encoder-pullback target via Itô's formula on the learned encoder and prove a bias decomposition showing the standard decoder-side formula carries systematic error for any imperfect chart. Under a $W^{2,\infty}$ chart-convergence assumption, chart-level error propagates controllably to weak convergence of the ambient dynamics and to convergence of radial mean first-passage times. Experiments on four surfaces embedded in up to $201$ ambient dimensions reduce radial MFPT error by $50$--$70\%$ under rotation dynamics and achieve the lowest inter-well MFPT error on most surface--transition pairs under metastable Müller--Brown Langevin dynamics, while reducing end-to-end ambient coefficient errors by up to an order of magnitude relative to an unregularized autoencoder.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.