ArXiv TLDR

Long-tail Internet photo reconstruction

🐦 Tweet
2604.22714

Yuan Li, Yuanbo Xiangli, Hadar Averbuch-Elor, Noah Snavely, Ruojin Cai

cs.CV

TLDR

A new dataset and sampling strategy enable 3D foundation models to robustly reconstruct sparse, long-tail internet photo collections.

Key contributions

  • Addresses the challenge of 3D reconstruction for sparse, "long-tail" internet photo collections.
  • Introduces MegaDepth-X, a large dataset with dense depth for simulating sparse scene supervision.
  • Proposes a novel sampling strategy to mimic real-world long-tail camera distributions.
  • Achieves robust 3D reconstructions from extremely sparse imagery, improving symmetric scenes.

Why it matters

The vast majority of real-world scenes are sparsely photographed, making their 3D reconstruction a significant challenge for current methods. This work provides a crucial step towards enabling 3D foundation models to handle these "long-tail" scenarios, unlocking potential for broader real-world application.

Original Abstract

Internet photo collections exhibit an extremely long-tailed distribution: a few famous landmarks are densely photographed and easily reconstructed in 3D, while most real-world sites are represented with sparse, noisy, uneven imagery beyond the capabilities of both classical and learned 3D methods. We believe that tackling this long-tail regime represents one of the next frontiers for 3D foundation models. Although reliable ground-truth 3D supervision from sparse scenes is challenging to acquire, we observe that it can be effectively simulated by sampling sparse subsets from well-reconstructed Internet landmarks. To this end, we introduce MegaDepth-X, a large dataset of 3D reconstructions with clean, dense depth, together with a strategy for sampling sets of training images that mimic camera distributions in long-tail scenes. Finetuning 3D foundation models with these components yields robust reconstructions under extreme sparsity, and also enables more reliable reconstruction in symmetric and repetitive scenes, while preserving generalization to standard, dense 3D benchmark datasets.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.