ProtoCLIP: Prototype-Aligned Latent Refinement for Robust Zero-Shot Chest X-Ray Classification
Florian Kittler, Sheethal Bhat, Andreas Maier
TLDR
ProtoCLIP improves zero-shot chest X-ray classification by refining CLIP-style VLMs with targeted data curation and distilled anchor alignment.
Key contributions
- Develops ProtoCLIP, a VLM refinement strategy for robust zero-shot chest X-ray classification.
- Curates pathology-focused training subsets with negative samples to reduce label co-occurrence bias.
- Employs representation-preserving distillation for stable adaptation and better discrimination.
- Achieves significant AUC improvements (2-10%) and SOTA for pneumothorax on unseen datasets.
Why it matters
This paper tackles key limitations of zero-shot VLMs in medical imaging, such as label co-occurrence and domain shift. ProtoCLIP's novel refinement strategy significantly boosts diagnostic accuracy and robustness for chest X-rays. Its ability to improve medical AI without large-scale retraining makes it highly impactful for clinical deployment.
Original Abstract
Zero-shot vision-language models (VLMs) have shown promise for chest radiograph classification, but their performance is often limited by confounding label co-occurrence, long-tail class imbalance, and transfer instability under domain shift. We propose ProtoCLIP, a refinement strategy for CLIP-style VLMs that improves zero-shot discrimination through targeted data curation and distilled anchor alignment. Specifically, we construct pathology-focused training subsets with curated negative samples to reduce co-occurrence bias. We also introduce a representation-preserving distillation objective to stabilize adaptation while maintaining semantic structure and improving discrimination of clinically relevant co-occurring pathologies. Evaluated on an unseen dataset VinDr-CXR, ProtoCLIP improves AUC by 2-10 percentage points over a strong CLIP-based baseline across multiple findings. For pneumothorax specifically, ProtoCLIP achieves a state-of-the-art AUC of 0.94. These results demonstrate that anchor-guided refinement, coupled with curated supervision and controlled adaptation, can mitigate common zero-shot transfer failures in medical VLMs without requiring large-scale retraining.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.