MiniVLA-Nav v1: A Multi-Scene Simulation Dataset for Language-Conditioned Robot Navigation

May 1, 20262605.00397

cs.RO

TLDR

MiniVLA-Nav v1 is a new multi-scene simulation dataset for language-conditioned robot navigation, featuring diverse environments and object approach tasks.

Key contributions

Introduces MiniVLA-Nav v1, a simulation dataset for Language-Conditioned Object Approach (LCOA) navigation.
Features 1,174 episodes across four photorealistic Isaac Sim environments (Office, Hospital, Warehouses).
Provides synchronized RGB, depth, instance segmentation, and expert action labels (v,omega, 7x7 tokens).
Ensures trajectory diversity via spawn distances, 12 object categories, and OOD paraphrase templates.

Why it matters

This dataset provides crucial resources for developing and evaluating language-conditioned robot navigation systems. Its diverse environments and comprehensive data types enable research into robust language understanding, perception, and control for real-world robotic applications. The evaluation splits also facilitate standardized benchmarking.

Original Abstract

We present MiniVLA-Nav v1, a simulation dataset for Language-Conditioned Object Approach (LCOA) navigation: given a short natural-language instruction, an NVIDIA Nova Carter differential-drive robot must navigate to the named object and stop within 1 m across four photorealistic Isaac Sim environments (Office, Hospital, Full Warehouse, and Warehouse with Multiple Shelves). Each of the 1,174 episodes pairs an instruction with synchronized 640x640 RGB images, metric depth maps (float32, metres), and instance segmentation masks, together with continuous (v,omega) and 7x7 tokenized expert action labels recorded at 60 Hz from a vision-based proportional controller. Trajectory diversity is ensured through three spawn-distance tiers (near: 1.5-3.5 m, mid: 3.5-7.0 m, far: global curated points; Pearson r=0.94 between spawn distance and trajectory length), 12 object categories, 18 training templates, and 12 paraphrase-OOD templates. Five evaluation splits support in-distribution accuracy, template-paraphrase robustness, and OOD object-category benchmarking. The dataset is publicly available at https://huggingface.co/datasets/alibustami/miniVLA-Nav

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers