MiniVLA-Nav v1: A Multi-Scene Simulation Dataset for Language-Conditioned Robot Navigation
TLDR
MiniVLA-Nav v1 is a new multi-scene simulation dataset for language-conditioned robot navigation, featuring diverse environments and object approach tasks.
Key contributions
- Introduces MiniVLA-Nav v1, a simulation dataset for Language-Conditioned Object Approach (LCOA) navigation.
- Features 1,174 episodes across four photorealistic Isaac Sim environments (Office, Hospital, Warehouses).
- Provides synchronized RGB, depth, instance segmentation, and expert action labels (v,omega, 7x7 tokens).
- Ensures trajectory diversity via spawn distances, 12 object categories, and OOD paraphrase templates.
Why it matters
This dataset provides crucial resources for developing and evaluating language-conditioned robot navigation systems. Its diverse environments and comprehensive data types enable research into robust language understanding, perception, and control for real-world robotic applications. The evaluation splits also facilitate standardized benchmarking.
Original Abstract
We present MiniVLA-Nav v1, a simulation dataset for Language-Conditioned Object Approach (LCOA) navigation: given a short natural-language instruction, an NVIDIA Nova Carter differential-drive robot must navigate to the named object and stop within 1 m across four photorealistic Isaac Sim environments (Office, Hospital, Full Warehouse, and Warehouse with Multiple Shelves). Each of the 1,174 episodes pairs an instruction with synchronized 640x640 RGB images, metric depth maps (float32, metres), and instance segmentation masks, together with continuous (v,omega) and 7x7 tokenized expert action labels recorded at 60 Hz from a vision-based proportional controller. Trajectory diversity is ensured through three spawn-distance tiers (near: 1.5-3.5 m, mid: 3.5-7.0 m, far: global curated points; Pearson r=0.94 between spawn distance and trajectory length), 12 object categories, 18 training templates, and 12 paraphrase-OOD templates. Five evaluation splits support in-distribution accuracy, template-paraphrase robustness, and OOD object-category benchmarking. The dataset is publicly available at https://huggingface.co/datasets/alibustami/miniVLA-Nav
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.