ArXiv TLDR

Regularizing Attention Scores with Bootstrapping

🐦 Tweet
2604.01339

Neo Christopher Chung, Maxim Laletin

cs.CVcs.AIcs.LGstat.MEstat.ML

TLDR

This paper introduces Attention Regularization, a bootstrapping method to quantify uncertainty and reduce noise in ViT attention scores for better interpretability.

Key contributions

  • Addresses noisy, diffused attention scores in ViT that limit interpretability.
  • Proposes a bootstrapping method to generate a baseline distribution of attention scores.
  • Uses this distribution to estimate significance and posterior probabilities of scores.
  • Demonstrates improved shrinkage and sparsity by removing spurious attention.

Why it matters

Interpreting Vision Transformers is crucial, but current attention maps are often too noisy to be truly useful. This work provides a statistical method to clean up these maps, making ViT decisions more transparent and reliable for critical applications like medical imaging.

Original Abstract

Vision transformers (ViT) rely on attention mechanism to weigh input features, and therefore attention scores have naturally been considered as explanations for its decision-making process. However, attention scores are almost always non-zero, resulting in noisy and diffused attention maps and limiting interpretability. Can we quantify uncertainty measures of attention scores and obtain regularized attention scores? To this end, we consider attention scores of ViT in a statistical framework where independent noise would lead to insignificant yet non-zero scores. Leveraging statistical learning techniques, we introduce the bootstrapping for attention scores which generates a baseline distribution of attention scores by resampling input features. Such a bootstrap distribution is then used to estimate significances and posterior probabilities of attention scores. In natural and medical images, the proposed \emph{Attention Regularization} approach demonstrates a straightforward removal of spurious attention arising from noise, drastically improving shrinkage and sparsity. Quantitative evaluations are conducted using both simulation and real-world datasets. Our study highlights bootstrapping as a practical regularization tool when using attention scores as explanations for ViT. Code available: https://github.com/ncchung/AttentionRegularization

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.