Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective

April 15, 20262604.14025

Weijie Wang, Qihang Cao, Sensen Gao, Donny Y. Chen, Haofei Xu + 8 more

cs.CVcs.AIcs.GR

TLDR

This survey introduces a problem-driven taxonomy for feed-forward 3D scene modeling, focusing on model design strategies over output representations.

Key contributions

Proposes a novel problem-driven taxonomy for feed-forward 3D scene modeling, agnostic to output format.
Categorizes research into five key problems: feature enhancement, geometry awareness, efficiency, augmentation, and temporal models.
Reviews existing benchmarks, datasets, and real-world applications of feed-forward 3D models.
Discusses open challenges and outlines future research directions in 3D scene modeling.

Why it matters

This survey offers a crucial new taxonomy for feed-forward 3D reconstruction, shifting focus from output representations to core model design problems. This problem-driven perspective provides a clearer framework for understanding the field, identifying key challenges, and guiding future research towards scalable and efficient 3D scene understanding.

Original Abstract

Reconstructing 3D representations from 2D inputs is a fundamental task in computer vision and graphics, serving as a cornerstone for understanding and interacting with the physical world. While traditional methods achieve high fidelity, they are limited by slow per-scene optimization or category-specific training, which hinders their practical deployment and scalability. Hence, generalizable feed-forward 3D reconstruction has witnessed rapid development in recent years. By learning a model that maps images directly to 3D representations in a single forward pass, these methods enable efficient reconstruction and robust cross-scene generalization. Our survey is motivated by a critical observation: despite the diverse geometric output representations, ranging from implicit fields to explicit primitives, existing feed-forward approaches share similar high-level architectural patterns, such as image feature extraction backbones, multi-view information fusion mechanisms, and geometry-aware design principles. Consequently, we abstract away from these representation differences and instead focus on model design, proposing a novel taxonomy centered on model design strategies that are agnostic to the output format. Our proposed taxonomy organizes the research directions into five key problems that drive recent research development: feature enhancement, geometry awareness, model efficiency, augmentation strategies and temporal-aware models. To support this taxonomy with empirical grounding and standardized evaluation, we further comprehensively review related benchmarks and datasets, and extensively discuss and categorize real-world applications based on feed-forward 3D models. Finally, we outline future directions to address open challenges such as scalability, evaluation standards, and world modeling.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers