ArXiv TLDR

FlowDIS: Language-Guided Dichotomous Image Segmentation with Flow Matching

🐦 Tweet
2605.05077

Andranik Sargsyan, Shant Navasardyan

cs.CV

TLDR

FlowDIS introduces a flow matching framework for language-guided dichotomous image segmentation, achieving state-of-the-art accuracy and control.

Key contributions

  • Presents FlowDIS, a novel dichotomous image segmentation method built on flow matching.
  • Learns a time-dependent vector field to transport image distributions to mask distributions.
  • Offers strong text-prompt controllability via its Position-Aware Instance Pairing (PAIP) strategy.
  • Outperforms state-of-the-art DIS methods, achieving 5.5% higher F-beta-omega on DIS-TE.

Why it matters

Existing DIS methods often fail to preserve fine-grained details or capture full semantic structure. FlowDIS addresses these limitations by significantly improving segmentation accuracy and adding robust language-guided control, crucial for precise image editing and analysis applications.

Original Abstract

Accurate image segmentation is essential for modern computer vision applications such as image editing, autonomous driving, and medical image analysis. In recent years, Dichotomous Image Segmentation (DIS) has become a standard task for training and evaluating highly accurate segmentation models. Existing DIS approaches often fail to preserve fine-grained details or fully capture the semantic structure of the foreground. To address these challenges, we present FlowDIS, a novel dichotomous image segmentation method built on the flow matching framework, which learns a time-dependent vector field to transport the image distribution to the corresponding mask distribution, optionally conditioned on a text prompt. Moreover, with our Position-Aware Instance Pairing (PAIP) training strategy, FlowDIS offers strong controllability through text prompts, enabling precise, pixel-level object segmentation. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art approaches both with and without language guidance. Compared with the best prior DIS method, FlowDIS achieves a 5.5% higher $F_β^ω$ measure and 43% lower MAE ($\mathcal{M}$) on the DIS-TE test set. The code is available at: https://github.com/Picsart-AI-Research/FlowDIS

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.