One-Step Generative Modeling via Wasserstein Gradient Flows

May 12, 20262605.11755

Jiaqi Han, Puheng Li, Qiushan Guo, Renyuan Xu, Stefano Ermon + 1 more

cs.LGcs.CVstat.ML

TLDR

W-Flow introduces a novel one-step generative model using Wasserstein gradient flows, achieving state-of-the-art image generation 100x faster than diffusion models.

Key contributions

Introduces W-Flow, a novel framework for one-step generative modeling via Wasserstein gradient flows.
Compresses a Wasserstein gradient flow evolution into a static neural generator for fast sampling.
Achieves SOTA 1.29 FID on ImageNet 256x256, with improved mode coverage and domain transfer.
Enables ~100x faster sampling compared to multi-step diffusion models with comparable fidelity.

Why it matters

This paper addresses the high computational cost of current generative models like diffusion models by proposing W-Flow, a one-step generation framework. It achieves state-of-the-art results on ImageNet with significantly faster sampling, making high-fidelity generative modeling more practical and efficient for various applications.

Original Abstract

Diffusion models and flow-based methods have shown impressive generative capability, especially for images, but their sampling is expensive because it requires many iterative updates. We introduce W-Flow, a framework for training a generator that transforms samples from a simple reference distribution into samples from a target data distribution in a single step. This is achieved in two steps: we first define an evolution from the reference distribution to the target distribution through a Wasserstein gradient flow that minimizes an energy functional; second, we train a static neural generator to compress this evolution into one-step generation. We instantiate the energy functional with the Sinkhorn divergence, which yields an efficient optimal-transport-based update rule that captures global distributional discrepancy and improves coverage of the target distribution. We further prove that the finite-sample training dynamics converge to the continuous-time distributional dynamics under suitable assumptions. Empirically, W-Flow sets a new state of the art for one-step ImageNet 256$\times$256 generation, achieving 1.29 FID, with improved mode coverage and domain transfer. Compared to multi-step diffusion models with similar FID scores, our method yields approximately 100$\times$ faster sampling. These results show that Wasserstein gradient flows provide a principled and effective foundation for fast and high-fidelity generative modeling.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers