SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

July 4, 20232307.01952

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn + 3 more

cs.CVcs.AI

TLDR

SDXL is an enhanced latent diffusion model for text-to-image synthesis that significantly improves image quality and fidelity by using a larger UNet backbone, dual text encoders, novel conditioning, and a refinement post-processing step.

Key contributions

Introduces a three times larger UNet backbone with more attention blocks and a second text encoder for richer cross-attention context.
Develops novel conditioning schemes and trains on multiple aspect ratios to improve flexibility and generation quality.
Adds a refinement model for post-hoc image-to-image enhancement, boosting visual fidelity of generated images.

Why it matters

This paper matters because it advances the state of text-to-image synthesis by scaling model architecture and innovating conditioning methods, resulting in higher-resolution, more visually compelling images. By openly releasing code and weights, it promotes transparency and accelerates research in large-scale generative models, enabling broader access to cutting-edge image generation technology.

Original Abstract

We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators. In the spirit of promoting open research and fostering transparency in large model training and evaluation, we provide access to code and model weights at https://github.com/Stability-AI/generative-models

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers