ArXiv TLDR

Masked Generative Transformer Is What You Need for Image Editing

🐦 Tweet
2605.10859

Wei Chow, Linfeng Li, Xian Sun, Lingdong Kong, Zefeng Li + 12 more

cs.CVcs.LG

TLDR

EditMGT, a novel Masked Generative Transformer, offers faster, more precise image editing by localizing changes, outperforming diffusion models.

Key contributions

  • Presents EditMGT, a novel Masked Generative Transformer (MGT) for localized image editing.
  • Uses multi-layer attention consolidation and region-hold sampling for precise, confined edits.
  • Achieves state-of-the-art image similarity and 6x faster editing than diffusion models.

Why it matters

This paper introduces EditMGT, a novel MGT-based framework that overcomes diffusion models' limitations by localizing image edits. It offers a more precise and efficient tool for content creation, achieving state-of-the-art results and 6x faster editing. This makes MGTs a compelling alternative.

Original Abstract

Diffusion models dominate image editing, yet their global denoising mechanism entangles edited regions with surrounding context, causing modifications to propagate into areas that should remain intact. We propose a fundamentally different approach by leveraging Masked Generative Transformers (MGTs), whose localized token-prediction paradigm naturally confines changes to intended regions. We present EditMGT, an MGT-based editing framework that is the first of its kind. Our approach employs multi-layer attention consolidation to aggregate cross-attention maps into precise edit localization signals, and region-hold sampling to explicitly prevent token flipping in non-target areas. To support training, we construct CrispEdit-2M, a 2M-sample high-resolution (>1024) editing dataset spanning seven categories. With only 960M parameters, EditMGT achieves state-of-the-art image similarity on multiple benchmarks while delivering 6x faster editing, demonstrating that MGTs offer a compelling alternative to diffusion-based editing.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.