ArXiv TLDR

Free Geometry: Refining 3D Reconstruction from Longer Versions of Itself

🐦 Tweet
2604.14048

Yuhang Dai, Xingyi Yang

cs.CV

TLDR

Free Geometry enables feed-forward 3D reconstruction models to self-adapt at test time using self-supervision from multiple views, improving accuracy.

Key contributions

  • Introduces Free Geometry, a framework for test-time self-evolution of 3D reconstruction models.
  • Leverages self-supervision by masking frames and enforcing cross-view feature consistency.
  • Achieves fast recalibration via lightweight LoRA updates, taking under 2 minutes per dataset.
  • Improves SOTA models (e.g., Depth Anything 3) by 3.73% in pose accuracy and 2.88% in point map prediction.

Why it matters

Feed-forward 3D reconstruction models are efficient but struggle with test-time adaptability. This paper introduces a novel self-supervised framework that allows these models to refine their geometry using additional views without ground truth. This significantly enhances accuracy and robustness in challenging real-world scenarios.

Original Abstract

Feed-forward 3D reconstruction models are efficient but rigid: once trained, they perform inference in a zero-shot manner and cannot adapt to the test scene. As a result, visually plausible reconstructions often contain errors, particularly under occlusions, specularities, and ambiguous cues. To address this, we introduce Free Geometry, a framework that enables feed-forward 3D reconstruction models to self-evolve at test time without any 3D ground truth. Our key insight is that, when the model receives more views, it produces more reliable and view-consistent reconstructions. Leveraging this property, given a testing sequence, we mask a subset of frames to construct a self-supervised task. Free Geometry enforces cross-view feature consistency between representations from full and partial observations, while maintaining the pairwise relations implied by the held-out frames. This self-supervision allows for fast recalibration via lightweight LoRA updates, taking less than 2 minutes per dataset on a single GPU. Our approach consistently improves state-of-the-art foundation models, including Depth Anything 3 and VGGT, across 4 benchmark datasets, yielding an average improvement of 3.73% in camera pose accuracy and 2.88% in point map prediction. Code is available at https://github.com/hiteacherIamhumble/Free-Geometry .

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.