Generalizable Sparse-View 3D Reconstruction from Unconstrained Images
Vinayak Gupta, Chih-Hao Lin, Shenlong Wang, Anand Bhattad, Jia-Bin Huang
TLDR
GenWildSplat enables generalizable, real-time 3D reconstruction from sparse, unconstrained images without per-scene optimization.
Key contributions
- Introduces GenWildSplat, a feed-forward framework for sparse-view 3D reconstruction from unconstrained images.
- Predicts depth, camera parameters, and 3D Gaussians using learned geometric priors in a canonical space.
- Uses an appearance adapter for lighting and semantic segmentation for transient objects, enhancing generalization.
- Achieves state-of-the-art feed-forward rendering quality and real-time inference without test-time optimization.
Why it matters
This paper addresses a critical challenge in 3D reconstruction: generalizing across diverse real-world conditions with sparse input. By eliminating per-scene optimization, GenWildSplat significantly improves efficiency and broadens the applicability of 3D reconstruction. Its real-time performance makes it practical for various applications.
Original Abstract
Reconstructing 3D scenes from sparse, unposed images remains challenging under real-world conditions with varying illumination and transient occlusions. Existing methods rely on scene-specific optimization using appearance embeddings or dynamic masks, which requires extensive per-scene training and fails under sparse views. Moreover, evaluations on limited scenes raise questions about generalization. We present GenWildSplat, a feed-forward framework for sparse-view outdoor reconstruction that requires no per-scene optimization. Given unposed internet images, GenWildSplat predicts depth, camera parameters, and 3D Gaussians in a canonical space using learned geometric priors. An appearance adapter modulates appearance for target lighting conditions, while semantic segmentation handles transient objects. Through curriculum learning on synthetic and real data, GenWildSplat generalizes across diverse illumination and occlusion patterns. Evaluations on PhotoTourism and MegaScenes benchmark demonstrate state-of-the-art feed-forward rendering quality, achieving real-time inference without test-time optimization
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.