FreeOcc: Training-Free Embodied Open-Vocabulary Occupancy Prediction
Zeyu Jiang, Changqing Zhou, Xingxing Zuo, Changhao Chen
TLDR
FreeOcc is a training-free framework for open-vocabulary occupancy prediction from monocular/RGB-D, outperforming supervised methods.
Key contributions
- Introduces FreeOcc, a training-free, open-vocabulary occupancy prediction framework.
- Operates without 3D annotations, pose ground truth, or any learning stage.
- Achieves over 2x IoU/mIoU improvements on EmbodiedOcc-ScanNet vs. self-supervised methods.
- Transfers zero-shot to novel environments, outperforming both supervised and self-supervised baselines.
Why it matters
This paper addresses the limitations of existing occupancy prediction methods by removing the need for extensive 3D annotations and training. FreeOcc's zero-shot generalization capabilities make it highly practical for real-world robotic applications in diverse, unseen environments. It significantly advances robust and adaptable scene understanding.
Original Abstract
Existing learning-based occupancy prediction methods rely on large-scale 3D annotations and generalize poorly across environments. We present FreeOcc, a training-free framework for open-vocabulary occupancy prediction from monocular or RGB-D sequences. Unlike prior approaches that require voxel-level supervision and ground-truth camera poses, FreeOcc operates without 3D annotations, pose ground truth, or any learning stage. FreeOcc incrementally builds a globally consistent occupancy map via a four-layer pipeline: a SLAM backbone estimates poses and sparse geometry; a geometrically consistent Gaussian update constructs dense 3D Gaussian maps; open-vocabulary semantics from off-the-shelf vision-language models are associated with Gaussian primitives; and a probabilistic Gaussian-to-occupancy projection produces dense voxel occupancy. Despite being entirely training-free and pose-agnostic, FreeOcc achieves over $2\times$ improvements in IoU and mIoU on EmbodiedOcc-ScanNet compared to prior self-supervised methods. We further introduce ReplicaOcc, a benchmark for indoor open-vocabulary occupancy prediction, and show that FreeOcc transfers zero-shot to novel environments, substantially outperforming both supervised and self-supervised baselines. Project page: https://the-masses.github.io/freeocc-web/.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.