ArXiv TLDR

Calibrating Scientific Foundation Models with Inference-Time Stochastic Attention

🐦 Tweet
2604.19530

Akash Yadav, Taiwo A. Adebiyi, Ruda Zhang

cs.LGcs.CEstat.ML

TLDR

Stochastic Attention improves uncertainty calibration in scientific foundation models by randomizing attention at inference time without retraining.

Key contributions

  • Proposes Stochastic Attention, an inference-time modification for calibrated uncertainty in Transformers.
  • Randomizes attention weights using multinomial samples, generating predictive ensembles without retraining.
  • Introduces a calibration objective for efficient post-hoc tuning of the concentration parameter.
  • Achieves superior calibration and sharper prediction intervals with minimal tuning compared to baselines.

Why it matters

Scientific foundation models need reliable uncertainty estimates for high-stakes applications. This paper offers a novel, efficient method to achieve calibrated uncertainty without costly retraining. Its lightweight nature and strong performance make it highly practical for real-world deployment.

Original Abstract

Transformer-based scientific foundation models are increasingly deployed in high-stakes settings, but current architectures give deterministic outputs and provide limited support for calibrated predictive uncertainty. We propose Stochastic Attention, a lightweight inference-time modification that randomizes attention by replacing softmax weights with normalized multinomial samples controlled by a single concentration parameter, and produces predictive ensembles without retraining. To set this parameter, we introduce a calibration objective that matches the stochastic attention output with the target, yielding an efficient univariate post-hoc tuning problem. We evaluate this mechanism on two scientific foundation models for weather and timeseries forecasting along with an additional regression task. Across benchmarks against uncertainty-aware baselines, we find that Stochastic Attention achieves the strongest native calibration and the sharpest prediction intervals at comparable coverage, while requiring only minutes of post-hoc tuning versus days of retraining for competitive baselines.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.