DVD: Discrete Voxel Diffusion for 3D Generation and Editing
Zhengrui Xiang, Jiaqi Wu, Fupeng Sun, Heliang Zheng, Yingzhen Li
TLDR
DVD is a discrete diffusion framework for 3D generation and editing of sparse voxels, offering improved interpretability and direct discrete modeling.
Key contributions
- Introduces Discrete Voxel Diffusion (DVD) for 3D generation, assessment, and editing of sparse voxels.
- Models voxel occupancy as a native discrete variable, avoiding continuous-to-discrete thresholding.
- Provides interpretable generation dynamics and uses predictive entropy for robust uncertainty estimation.
- Enables lightweight fine-tuning for efficient inpainting and editing of voxels in one sampling round.
Why it matters
This paper introduces a novel discrete diffusion approach for 3D generation that directly models voxel occupancy, simplifying the process. It offers improved interpretability and robust uncertainty estimation, which are crucial for reliable 3D content creation. The efficient editing capabilities further enhance its practical utility.
Original Abstract
We introduce Discrete Voxel Diffusion (DVD), a discrete diffusion framework to generate, assess, and edit sparse voxels for SLat (Structured LATent) based 3D generative pipelines. Although discrete diffusion has not generally displaced continuous diffusion in image-like generation, we show that it can be an effective first-stage prior for sparse voxel scaffolds. By treating voxel occupancy as a native discrete variable, DVD avoids continuous-to-discrete thresholding and provides a simple framework for voxel generation, uncertainty estimation, and editing. Beyond quality gains, DVD provides more interpretable generation dynamics through explicit categorical modeling. Furthermore, we leverage the predictive entropy as a robust uncertainty metric to identify ambiguous voxel regions and complicated samples, facilitating tasks such as data filtering and quality assessment. Finally, we propose a lightweight fine-tuning strategy using block-structured perturbation patterns. This approach empowers the model to inpaint and edit voxels within a single sampling round, requiring negligible auxiliary computation and no additional model evaluations.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.