Teacher Forcing as Generalized Bayes: Optimization Geometry Mismatch in Switching Surrogates for Chaotic Dynamics

April 28, 20262604.25904

Andre Herz, Daniel Durstewitz, Georgia Koppe

cs.LGmath.DSstat.ML

TLDR

This paper explores the geometry mismatch between teacher forcing and marginal likelihood in recurrent neural networks for chaotic systems.

Key contributions

Teacher forcing (ITF) as a generalized Bayes update can mismatch free-running model's marginal likelihood geometry.
Compares objective curvatures of ITF and marginal likelihood in AL-RNNs for chaotic dynamical systems.
ITF inflates curvature by conditioning on a single forced path, while marginal likelihood reduces it.
Windowed evidence fine-tuning improves held-out evidence but can degrade dynamical quantities of interest (QoIs).

Why it matters

This paper reveals a geometry mismatch between teacher forcing and marginal likelihood in training recurrent models for chaotic systems. Crucially, for robust model building, optimizing for evidence alone can degrade key dynamical properties.

Original Abstract

Identity teacher forcing (ITF) enables stable training of deterministic recurrent surrogates for chaotic dynamical systems and has been highly effective for dynamical systems reconstruction (DSR) with recurrent neural networks (RNNs), including interpretable almost-linear RNNs (AL-RNNs). However, as an intervention-based prediction loss (and thus a generalized Bayes update), teacher forcing need not match the free-running model's marginal likelihood geometry. We compare the objective-induced curvatures of ITF and marginal likelihood in a probabilistic switching augmentation of AL-RNNs, estimating ambiguity-aware observed information via Louis' identity. In the switching setting studied here, conditioning on a single forced regime path (as ITF does) inflates curvature, while marginal likelihood curvature is reduced by a missing-information correction when multiple switching explanations remain plausible. In Lorenz-63 experiments, windowed evidence fine-tuning improves held-out evidence but can degrade dynamical quantities of interest (QoIs) relative to ITF-pretrained models.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers