Geometric Context Transformer for Streaming 3D Reconstruction

April 15, 20262604.14141

Lin-Zhuo Chen, Jian Gao, Yihang Chen, Ka Leong Cheng, Yipengjing Sun + 6 more

cs.CV

TLDR

LingBot-Map introduces a Geometric Context Transformer for streaming 3D reconstruction, achieving efficient, accurate, and stable performance over long sequences.

Key contributions

Introduces LingBot-Map, a feed-forward 3D foundation model for streaming 3D reconstruction.
Features a Geometric Context Transformer (GCT) with a novel attention mechanism for robust scene understanding.
GCT integrates anchor context, pose-reference window, and trajectory memory for grounding and drift correction.
Achieves stable 20 FPS inference on long sequences (>10,000 frames) with superior accuracy over baselines.

Why it matters

LingBot-Map introduces a feed-forward 3D foundation model for streaming reconstruction. Its Geometric Context Transformer achieves superior accuracy, long-range consistency, and efficiency, making it a robust real-time solution for 3D scene understanding.

Original Abstract

Streaming 3D reconstruction aims to recover 3D information, such as camera poses and point clouds, from a video stream, which necessitates geometric accuracy, temporal consistency, and computational efficiency. Motivated by the principles of Simultaneous Localization and Mapping (SLAM), we introduce LingBot-Map, a feed-forward 3D foundation model for reconstructing scenes from streaming data, built upon a geometric context transformer (GCT) architecture. A defining aspect of LingBot-Map lies in its carefully designed attention mechanism, which integrates an anchor context, a pose-reference window, and a trajectory memory to address coordinate grounding, dense geometric cues, and long-range drift correction, respectively. This design keeps the streaming state compact while retaining rich geometric context, enabling stable efficient inference at around 20 FPS on 518 x 378 resolution inputs over long sequences exceeding 10,000 frames. Extensive evaluations across a variety of benchmarks demonstrate that our approach achieves superior performance compared to both existing streaming and iterative optimization-based approaches.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers