OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence
Jianhui Liu, Haoze Sun, Wenbo Li, Yanbing Zhang, Rui Yang + 9 more
TLDR
OpenSpatial is an open-source data engine and 3M-sample dataset that significantly improves spatial reasoning models, achieving SOTA performance.
Key contributions
- Introduces OpenSpatial, an open-source data engine for high-quality, scalable spatial data generation.
- Uses 3D bounding boxes to build a data hierarchy for five core spatial measurement and reasoning tasks.
- Curates OpenSpatial-3M, a large-scale dataset with 3 million high-fidelity samples for spatial intelligence.
- Models trained on OpenSpatial-3M achieve state-of-the-art performance, with a 19% average improvement.
Why it matters
This paper addresses a critical gap in spatial intelligence research by providing a principled, open-source data engine and a massive dataset. It enables the creation of high-quality spatial data at scale, which is crucial for advancing AI's understanding of the physical world. The significant performance gains demonstrate its immediate impact.
Original Abstract
Spatial understanding is a fundamental cornerstone of human-level intelligence. Nonetheless, current research predominantly focuses on domain-specific data production, leaving a critical void: the absence of a principled, open-source engine capable of fully unleashing the potential of high-quality spatial data. To bridge this gap, we elucidate the design principles of a robust data generation system and introduce OpenSpatial -- an open-source data engine engineered for high quality, extensive scalability, broad task diversity, and optimized efficiency. OpenSpatial adopts 3D bounding boxes as the fundamental primitive to construct a comprehensive data hierarchy across five foundational tasks: Spatial Measurement (SM), Spatial Relationship (SR), Camera Perception (CP), Multi-view Consistency (MC), and Scene-Aware Reasoning (SAR). Leveraging this scalable infrastructure, we curate OpenSpatial-3M, a large-scale dataset comprising 3 million high-fidelity samples. Extensive evaluations demonstrate that versatile models trained on our dataset achieve state-of-the-art performance across a wide spectrum of spatial reasoning benchmarks. Notably, the best-performing model exhibits a substantial average improvement of 19 percent, relatively. Furthermore, we provide a systematic analysis of how data attributes influence spatial perception. By open-sourcing both the engine and the 3M-scale dataset, we provide a robust foundation to accelerate future research in spatial intelligence.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.