ProtoCLIP: Prototype-Aligned Latent Refinement for Robust Zero-Shot Chest X-Ray Classification

April 20, 20262604.18444

Florian Kittler, Sheethal Bhat, Andreas Maier

cs.LGcs.AIcs.CV

TLDR

ProtoCLIP improves zero-shot chest X-ray classification by refining CLIP-style VLMs with targeted data curation and distilled anchor alignment.

Key contributions

Develops ProtoCLIP, a VLM refinement strategy for robust zero-shot chest X-ray classification.
Curates pathology-focused training subsets with negative samples to reduce label co-occurrence bias.
Employs representation-preserving distillation for stable adaptation and better discrimination.
Achieves significant AUC improvements (2-10%) and SOTA for pneumothorax on unseen datasets.

Why it matters

This paper tackles key limitations of zero-shot VLMs in medical imaging, such as label co-occurrence and domain shift. ProtoCLIP's novel refinement strategy significantly boosts diagnostic accuracy and robustness for chest X-rays. Its ability to improve medical AI without large-scale retraining makes it highly impactful for clinical deployment.

Original Abstract

Zero-shot vision-language models (VLMs) have shown promise for chest radiograph classification, but their performance is often limited by confounding label co-occurrence, long-tail class imbalance, and transfer instability under domain shift. We propose ProtoCLIP, a refinement strategy for CLIP-style VLMs that improves zero-shot discrimination through targeted data curation and distilled anchor alignment. Specifically, we construct pathology-focused training subsets with curated negative samples to reduce co-occurrence bias. We also introduce a representation-preserving distillation objective to stabilize adaptation while maintaining semantic structure and improving discrimination of clinically relevant co-occurring pathologies. Evaluated on an unseen dataset VinDr-CXR, ProtoCLIP improves AUC by 2-10 percentage points over a strong CLIP-based baseline across multiple findings. For pneumothorax specifically, ProtoCLIP achieves a state-of-the-art AUC of 0.94. These results demonstrate that anchor-guided refinement, coupled with curated supervision and controlled adaptation, can mitigate common zero-shot transfer failures in medical VLMs without requiring large-scale retraining.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers