Vision-and-Language Navigation for UAVs: Progress, Challenges, and a Research Roadmap
Hanxuan Chen, Jie Zheng, Siqi Yang, Tianle Zeng, Siwei Feng + 7 more
TLDR
Comprehensive survey of UAV vision-language navigation, challenges, and a roadmap for future embodied AI research.
Key contributions
- Defines UAV-VLN tasks and traces evolution from modular to foundation model-based systems.
- Reviews key resources: simulators, datasets, and evaluation metrics for standardized research.
- Analyzes challenges like sim-to-real gap, outdoor perception, linguistic ambiguity, and hardware limits.
- Proposes a research roadmap focusing on multi-agent coordination and air-ground robotic collaboration.
Why it matters
This paper consolidates UAV vision-language navigation progress and challenges, guiding future research in embodied AI. It highlights critical barriers and emerging directions for real-world UAV deployment.
Original Abstract
Vision-and-Language Navigation for Unmanned Aerial Vehicles (UAV-VLN) represents a pivotal challenge in embodied artificial intelligence, focused on enabling UAVs to interpret high-level human commands and execute long-horizon tasks in complex 3D environments. This paper provides a comprehensive and structured survey of the field, from its formal task definition to the current state of the art. We establish a methodological taxonomy that charts the technological evolution from early modular and deep learning approaches to contemporary agentic systems driven by large foundation models, including Vision-Language Models (VLMs), Vision-Language-Action (VLA) models, and the emerging integration of generative world models with VLA architectures for physically-grounded reasoning. The survey systematically reviews the ecosystem of essential resources simulators, datasets, and evaluation metrics that facilitates standardized research. Furthermore, we conduct a critical analysis of the primary challenges impeding real-world deployment: the simulation-to-reality gap, robust perception in dynamic outdoor settings, reasoning with linguistic ambiguity, and the efficient deployment of large models on resource-constrained hardware. By synthesizing current benchmarks and limitations, this survey concludes by proposing a forward-looking research roadmap to guide future inquiry into key frontiers such as multi-agent swarm coordination and air-ground collaborative robotics.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.