Decoupled PFNs: Identifiable Epistemic-Aleatoric Decomposition via Structured Synthetic Priors
Richard Bergna, Stefan Depeweg, José Miguel Hernández-Lobato
TLDR
Decoupled PFNs use structured synthetic priors to identifiably separate epistemic and aleatoric uncertainty, improving sequential decision-making in noisy settings.
Key contributions
- Standard PFNs struggle to distinguish epistemic from aleatoric uncertainty, hindering efficient exploration.
- Introduces Decoupled PFNs, exploiting synthetic data to explicitly label latent signals and noise.
- Trains separate network heads for latent signal and aleatoric noise, enabling identifiable decomposition.
- Epistemic-only acquisition significantly improves performance in noisy Bayesian Optimization and HPO.
Why it matters
This paper addresses a critical limitation in Bayesian prediction for sequential decision-making. By clearly separating epistemic and aleatoric uncertainty, it enables more efficient and targeted exploration. This is crucial for active learning and Bayesian optimization, leading to better decisions in noisy environments.
Original Abstract
Prior-Fitted Networks (PFNs) amortize Bayesian prediction by meta-learning over a synthetic task prior, but their standard output is a posterior predictive distribution over noisy observations. For sequential decision-making, such as active learning and Bayesian optimization, acquisition should prioritize epistemic uncertainty about the latent signal rather than irreducible aleatoric observation noise. We show that this epistemic--aleatoric split is not identifiable in general from the posterior predictive distribution alone, even when that distribution is known exactly. We then exploit a distinctive advantage of PFNs: because the synthetic data-generating process is under our control, each task can contain an explicit latent signal and noise function, and the generator can provide query-level labels for both the noiseless target and the observation-noise variance. We use these labels to train a decoupled PFN with separate latent-signal and aleatoric heads. The observation-level predictive is induced by convolving the latent signal distribution with the learned noise model. Empirically, epistemic-only acquisition mitigates the failure mode of total-variance exploration in noisy and heteroscedastic settings. In matched comparisons, decoupled models usually improve over tuned observation-level baselines, with the clearest gains in HPO; in broader sweeps, a decoupled model obtains the best average rank in both HPO and synthetic BO.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.