Benchmarking Pathology Foundation Models for Breast Cancer Survival Prediction

April 27, 20262604.24679

Fredrik K. Gustafsson, Constance Boissin, Johan Vallon-Christersson, David A. Clifton, Mattias Rantalainen

cs.CVcs.LG

TLDR

This paper benchmarks pathology foundation models for breast cancer survival prediction, finding H-optimus-1 best and compact models surprisingly effective.

Key contributions

First large-scale, externally validated benchmark of PFMs for breast cancer survival prediction.
H-optimus-1 consistently achieved the strongest survival prediction performance.
Second-generation PFMs generally outperformed first-generation models.
Compact H0-mini model surprisingly outperformed its larger teacher, H-optimus-0, with fewer parameters.

Why it matters

This paper benchmarks pathology foundation models for breast cancer survival. It's the first large-scale, externally validated study. Findings guide efficient clinical deployment, highlighting top models and compact model efficacy.

Original Abstract

Pathology foundation models (PFMs) have recently emerged as powerful pretrained encoders for computational pathology, enabling transfer learning across a wide range of downstream tasks. However, systematic comparisons of these models for clinically meaningful prediction problems remain limited, especially in the context of survival prediction under external validation. In this study, we benchmark widely used and recently proposed PFMs for breast cancer survival prediction from whole-slide histopathology images. Using a standardized pipeline based on patch-level feature extraction and a unified survival modeling framework, we evaluate model representations across three independent clinical cohorts comprising more than 5,400 patients with long-term follow-up. Models are trained on one cohort and evaluated on two independent external cohorts, enabling a rigorous assessment of cross-dataset generalization. Overall, H-optimus-1 achieves the strongest survival prediction performance. More broadly, we observe consistent generational improvements across model families, with second-generation PFMs outperforming their first-generation counterparts. However, absolute performance differences between many recent PFMs remain modest, suggesting diminishing returns from further scaling of pretraining data or model size alone. Notably, the compact distilled model H0-mini slightly outperforms its larger teacher model H-optimus-0, despite using fewer than 8% of the parameters and enabling significantly faster feature extraction. Together, these results provide the first large-scale, externally validated benchmark of PFMs for breast cancer survival prediction, and offer practical guidance for efficient deployment of PFMs in clinical workflows.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers