SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn + 3 more
TLDR
SDXL is an enhanced latent diffusion model for text-to-image synthesis that significantly improves image quality and fidelity by using a larger UNet backbone, dual text encoders, novel conditioning, and a refinement post-processing step.
Key contributions
- Introduces a three times larger UNet backbone with more attention blocks and a second text encoder for richer cross-attention context.
- Develops novel conditioning schemes and trains on multiple aspect ratios to improve flexibility and generation quality.
- Adds a refinement model for post-hoc image-to-image enhancement, boosting visual fidelity of generated images.
Why it matters
This paper matters because it advances the state of text-to-image synthesis by scaling model architecture and innovating conditioning methods, resulting in higher-resolution, more visually compelling images. By openly releasing code and weights, it promotes transparency and accelerates research in large-scale generative models, enabling broader access to cutting-edge image generation technology.
Original Abstract
We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators. In the spirit of promoting open research and fostering transparency in large model training and evaluation, we provide access to code and model weights at https://github.com/Stability-AI/generative-models
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.