FreeOcc: Training-Free Embodied Open-Vocabulary Occupancy Prediction

April 30, 20262604.28115

Zeyu Jiang, Changqing Zhou, Xingxing Zuo, Changhao Chen

cs.ROcs.CV

TLDR

FreeOcc is a training-free framework for open-vocabulary occupancy prediction from monocular/RGB-D, outperforming supervised methods.

Key contributions

Introduces FreeOcc, a training-free, open-vocabulary occupancy prediction framework.
Operates without 3D annotations, pose ground truth, or any learning stage.
Achieves over 2x IoU/mIoU improvements on EmbodiedOcc-ScanNet vs. self-supervised methods.
Transfers zero-shot to novel environments, outperforming both supervised and self-supervised baselines.

Why it matters

This paper addresses the limitations of existing occupancy prediction methods by removing the need for extensive 3D annotations and training. FreeOcc's zero-shot generalization capabilities make it highly practical for real-world robotic applications in diverse, unseen environments. It significantly advances robust and adaptable scene understanding.

Original Abstract

Existing learning-based occupancy prediction methods rely on large-scale 3D annotations and generalize poorly across environments. We present FreeOcc, a training-free framework for open-vocabulary occupancy prediction from monocular or RGB-D sequences. Unlike prior approaches that require voxel-level supervision and ground-truth camera poses, FreeOcc operates without 3D annotations, pose ground truth, or any learning stage. FreeOcc incrementally builds a globally consistent occupancy map via a four-layer pipeline: a SLAM backbone estimates poses and sparse geometry; a geometrically consistent Gaussian update constructs dense 3D Gaussian maps; open-vocabulary semantics from off-the-shelf vision-language models are associated with Gaussian primitives; and a probabilistic Gaussian-to-occupancy projection produces dense voxel occupancy. Despite being entirely training-free and pose-agnostic, FreeOcc achieves over $2\times$ improvements in IoU and mIoU on EmbodiedOcc-ScanNet compared to prior self-supervised methods. We further introduce ReplicaOcc, a benchmark for indoor open-vocabulary occupancy prediction, and show that FreeOcc transfers zero-shot to novel environments, substantially outperforming both supervised and self-supervised baselines. Project page: https://the-masses.github.io/freeocc-web/.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers