ArXiv TLDR

GeoQuery: Geometry-Query Diffusion for Sparse-View Reconstruction

🐦 Tweet
2605.12399

Xiao Cao, Yuze Li, Youmin Zhang, Jiayu Song, Cheng Yan + 2 more

cs.CV

TLDR

GeoQuery improves sparse-view 3D reconstruction with 3D Gaussian Splatting by integrating geometry-guided diffusion and a novel cross-view attention mechanism.

Key contributions

  • Introduces GeoQuery, a geometry-guided diffusion framework for robust sparse-view 3D reconstruction.
  • Proposes Geometry-guided Cross-view Attention (GCA) to overcome issues with corrupted 3DGS outputs.
  • Leverages depth maps and camera poses to form geometry-aligned proxy queries, replacing corrupted features.
  • Restricts cross-view attention to local windows, effectively retrieving useful features and suppressing spurious matches.

Why it matters

Sparse-view 3D reconstruction is critical but challenging, with existing methods failing on heavily corrupted inputs. GeoQuery offers a robust solution by integrating geometric priors into diffusion models. This significantly improves 3D modeling quality from minimal data, broadening applications.

Original Abstract

3D Gaussian Splatting (3DGS) has emerged as a prominent paradigm for 3D reconstruction and novel view synthesis. However, it remains vulnerable to severe artifacts when trained under sparse-view constraints. While recent methods attempt to rectify artifacts in rendered views using image diffusion models, they typically rely on multi-view self-attention to retrieve information from reference images. We observe that this mechanism often fails when the rendered novel views output by 3DGS are heavily corrupted: damaged query features lead to erroneous cross-view retrieval, resulting in inconsistent rendering refinement. To address this, we propose GeoQuery, a geometry-guided diffusion framework that integrates generative priors with explicit geometric cues via a novel Geometry-guided Cross-view Attention (GCA) mechanism. First, by leveraging predicted depth maps and camera poses, we construct a geometry-induced correspondence field to sample reference features, forming a geometry-aligned proxy query that replaces the corrupted rendering features. Furthermore, we design a new cross-view feature aggregation pipeline, in which we restrict the cross-view attention to a local window around each proxy query to effectively retrieve useful features while suppressing spurious matches. GeoQuery can be seamlessly integrated into existing diffusion-based pipelines, enabling robust reconstruction even under extreme view sparsity. Extensive experiments on sparse-view novel view synthesis and rendering artifact removal demonstrate the effectiveness of our approach.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.