GGD-SLAM: Monocular 3DGS SLAM Powered by Generalizable Motion Model for Dynamic Environments
Yi Liu, Haoxuan Xu, Hongbo Duan, Keyu Fan, Zhengyang Zhang + 3 more
TLDR
GGD-SLAM is a monocular 3DGS SLAM system that uses a generalizable motion model for robust localization and dense mapping in dynamic environments.
Key contributions
- Introduces GGD-SLAM, a monocular 3DGS SLAM for robust localization and mapping in dynamic scenes.
- Uses a FIFO queue and sequential attention for dynamic semantic feature extraction.
- Employs a dynamic feature enhancer to separate static and dynamic scene components.
- Proposes a distractor-adaptive SSIM loss and occlusion filling for enhanced resilience.
Why it matters
Existing 3DGS SLAM systems struggle in dynamic environments due to static assumptions. GGD-SLAM addresses this with a generalizable motion model. It enables robust localization and dense mapping in real-world dynamic scenes, critical for robotics and AR.
Original Abstract
Visual SLAM algorithms achieve significant improvements through the exploration of 3D Gaussian Splatting (3DGS) representations, particularly in generating high-fidelity dense maps. However, they depend on a static environment assumption and experience significant performance degradation in dynamic environments. This paper presents GGD-SLAM, a framework that employs a generalizable motion model to address the challenges of localization and dense mapping in dynamic environments - without predefined semantic annotations or depth input. Specifically, the proposed system employs a First-In-First-Out (FIFO) queue to manage incoming frames, facilitating dynamic semantic feature extraction through a sequential attention mechanism. This is integrated with a dynamic feature enhancer to separate static and dynamic components. Additionally, to minimize dynamic distractors' impact on the static components, we devise a method to fill occluded areas via static information sampling and design a distractor-adaptive Structure Similarity Index Measure (SSIM) loss tailored for dynamic environments, significantly enhancing the system's resilience. Experiments conducted on real-world dynamic datasets demonstrate that the proposed system achieves state-of-the-art performance in camera pose estimation and dense reconstruction in dynamic scenes.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.