SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments
Dinging Li, Yingxiu Zhao, Xinrui Cheng, Kangheng Lin, Hongbo Peng + 14 more
TLDR
SpatialEvo uses deterministic geometric environments to enable self-evolving 3D spatial reasoning, outperforming existing methods by generating precise, physically valid training data.
Key contributions
- SpatialEvo uses Deterministic Geometric Environments (DGE) to generate precise, physically valid 3D spatial reasoning questions.
- DGE converts unannotated 3D scenes into zero-noise interactive oracles, replacing error-prone model consensus.
- A shared-parameter policy co-evolves as questioner and solver, with a scheduler focusing on the model's weakest categories.
- Achieves state-of-the-art performance on nine 3D spatial reasoning benchmarks at 3B and 7B scales.
Why it matters
SpatialEvo offers a breakthrough in 3D spatial reasoning by enabling continuous model improvement without costly annotations. It replaces error-prone pseudo-labeling with deterministic geometric ground truth, providing objective physical feedback. This robust, scalable method significantly advances embodied intelligence.
Original Abstract
Spatial reasoning over three-dimensional scenes is a core capability for embodied intelligence, yet continuous model improvement remains bottlenecked by the cost of geometric annotation. The self-evolving paradigm offers a promising path, but its reliance on model consensus to construct pseudo-labels causes training to reinforce rather than correct the model's own geometric errors. We identify a property unique to 3D spatial reasoning that circumvents this limitation: ground truth is a deterministic consequence of the underlying geometry, computable exactly from point clouds and camera poses without any model involvement. Building on this insight, we present SpatialEvo, a self-evolving framework for 3D spatial reasoning, centered on the Deterministic Geometric Environment (DGE). The DGE formalizes 16 spatial reasoning task categories under explicit geometric validation rules and converts unannotated 3D scenes into zero-noise interactive oracles, replacing model consensus with objective physical feedback. A single shared-parameter policy co-evolves across questioner and solver roles under DGE constraints: the questioner generates physically valid spatial questions grounded in scene observations, while the solver derives precise answers against DGE-verified ground truth. A task-adaptive scheduler endogenously concentrates training on the model's weakest categories, producing a dynamic curriculum without manual design. Experiments across nine benchmarks demonstrate that SpatialEvo achieves the highest average score at both 3B and 7B scales, with consistent gains on spatial reasoning benchmarks and no degradation on general visual understanding.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.