GeoQuery: Geometry-Query Diffusion for Sparse-View Reconstruction
Xiao Cao, Yuze Li, Youmin Zhang, Jiayu Song, Cheng Yan + 2 more
TLDR
GeoQuery improves sparse-view 3D reconstruction with 3D Gaussian Splatting by integrating geometry-guided diffusion and a novel cross-view attention mechanism.
Key contributions
- Introduces GeoQuery, a geometry-guided diffusion framework for robust sparse-view 3D reconstruction.
- Proposes Geometry-guided Cross-view Attention (GCA) to overcome issues with corrupted 3DGS outputs.
- Leverages depth maps and camera poses to form geometry-aligned proxy queries, replacing corrupted features.
- Restricts cross-view attention to local windows, effectively retrieving useful features and suppressing spurious matches.
Why it matters
Sparse-view 3D reconstruction is critical but challenging, with existing methods failing on heavily corrupted inputs. GeoQuery offers a robust solution by integrating geometric priors into diffusion models. This significantly improves 3D modeling quality from minimal data, broadening applications.
Original Abstract
3D Gaussian Splatting (3DGS) has emerged as a prominent paradigm for 3D reconstruction and novel view synthesis. However, it remains vulnerable to severe artifacts when trained under sparse-view constraints. While recent methods attempt to rectify artifacts in rendered views using image diffusion models, they typically rely on multi-view self-attention to retrieve information from reference images. We observe that this mechanism often fails when the rendered novel views output by 3DGS are heavily corrupted: damaged query features lead to erroneous cross-view retrieval, resulting in inconsistent rendering refinement. To address this, we propose GeoQuery, a geometry-guided diffusion framework that integrates generative priors with explicit geometric cues via a novel Geometry-guided Cross-view Attention (GCA) mechanism. First, by leveraging predicted depth maps and camera poses, we construct a geometry-induced correspondence field to sample reference features, forming a geometry-aligned proxy query that replaces the corrupted rendering features. Furthermore, we design a new cross-view feature aggregation pipeline, in which we restrict the cross-view attention to a local window around each proxy query to effectively retrieve useful features while suppressing spurious matches. GeoQuery can be seamlessly integrated into existing diffusion-based pipelines, enabling robust reconstruction even under extreme view sparsity. Extensive experiments on sparse-view novel view synthesis and rendering artifact removal demonstrate the effectiveness of our approach.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.