Calibrating Scientific Foundation Models with Inference-Time Stochastic Attention

April 21, 20262604.19530

Akash Yadav, Taiwo A. Adebiyi, Ruda Zhang

cs.LGcs.CEstat.ML

TLDR

Stochastic Attention improves uncertainty calibration in scientific foundation models by randomizing attention at inference time without retraining.

Key contributions

Proposes Stochastic Attention, an inference-time modification for calibrated uncertainty in Transformers.
Randomizes attention weights using multinomial samples, generating predictive ensembles without retraining.
Introduces a calibration objective for efficient post-hoc tuning of the concentration parameter.
Achieves superior calibration and sharper prediction intervals with minimal tuning compared to baselines.

Why it matters

Scientific foundation models need reliable uncertainty estimates for high-stakes applications. This paper offers a novel, efficient method to achieve calibrated uncertainty without costly retraining. Its lightweight nature and strong performance make it highly practical for real-world deployment.

Original Abstract

Transformer-based scientific foundation models are increasingly deployed in high-stakes settings, but current architectures give deterministic outputs and provide limited support for calibrated predictive uncertainty. We propose Stochastic Attention, a lightweight inference-time modification that randomizes attention by replacing softmax weights with normalized multinomial samples controlled by a single concentration parameter, and produces predictive ensembles without retraining. To set this parameter, we introduce a calibration objective that matches the stochastic attention output with the target, yielding an efficient univariate post-hoc tuning problem. We evaluate this mechanism on two scientific foundation models for weather and timeseries forecasting along with an additional regression task. Across benchmarks against uncertainty-aware baselines, we find that Stochastic Attention achieves the strongest native calibration and the sharpest prediction intervals at comparable coverage, while requiring only minutes of post-hoc tuning versus days of retraining for competitive baselines.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers