ArXiv TLDR

SD-FSMIS: Adapting Stable Diffusion for Few-Shot Medical Image Segmentation

🐦 Tweet
2604.03134

Meihua Li, Yang Zhang, Weizhao He, Hu Qu, Yisong Li

cs.CV

TLDR

SD-FSMIS adapts Stable Diffusion for few-shot medical image segmentation, achieving competitive results and strong cross-domain generalization.

Key contributions

  • Introduces SD-FSMIS, a framework adapting Stable Diffusion for Few-Shot Medical Image Segmentation.
  • Repurposes SD's architecture with Support-Query Interaction (SQI) for FSMIS adaptation.
  • Uses a Visual-to-Textual Condition Translator (VTCT) to guide the diffusion model with visual cues.
  • Achieves competitive results and excellent cross-domain generalization in FSMIS.

Why it matters

This paper addresses data scarcity in medical imaging by leveraging large-scale diffusion models for few-shot segmentation. SD-FSMIS demonstrates that adapting powerful generative models can lead to robust and data-efficient solutions, especially for challenging cross-domain medical tasks.

Original Abstract

Few-Shot Medical Image Segmentation (FSMIS) aims to segment novel object classes in medical images using only minimal annotated examples, addressing the critical challenges of data scarcity and domain shifts prevalent in medical imaging. While Diffusion Models (DM) excel in visual tasks, their potential for FSMIS remains largely unexplored. We propose that the rich visual priors learned by large-scale DMs offer a powerful foundation for a more robust and data-efficient segmentation approach. In this paper, we introduce SD-FSMIS, a novel framework designed to effectively adapt the powerful pre-trained Stable Diffusion (SD) model for the FSMIS task. Our approach repurposes its conditional generative architecture by introducing two key components: a Support-Query Interaction (SQI) and a Visual-to-Textual Condition Translator (VTCT). Specifically, SQI provides a straightforward yet powerful means of adapting SD to the FSMIS paradigm. The VTCT module translates visual cues from the support set into an implicit textual embedding that guides the diffusion model, enabling precise conditioning of the generation process. Extensive experiments demonstrate that SD-FSMIS achieves competitive results compared to state-of-the-art methods in standard settings. Surprisingly, it also demonstrated excellent generalization ability in more challenging cross-domain scenarios. These findings highlight the immense potential of adapting large-scale generative models to advance data-efficient and robust medical image segmentation.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.