Walk With Me: Long-Horizon Social Navigation for Human-Centric Outdoor Assistance
Lingfeng Zhang, Xiaoshuai Hao, Xizhou Bu, Yingbo Tang, Hongsheng Li + 9 more
TLDR
Walk with Me is a map-free framework enabling robots to perform safe, long-horizon social navigation outdoors using high-level human instructions.
Key contributions
- Map-free framework for long-horizon outdoor social navigation from natural language.
- Leverages GPS and public map APIs for semantic destination grounding and waypoint proposals.
- Employs a hierarchical VLM for planning and an observation-aware router for safety.
- Integrates low-level action generation with high-level safety reasoning for complex scenarios.
Why it matters
This paper addresses a critical gap in robot navigation, enabling practical, safe, and socially compliant assistance for humans in complex outdoor environments without relying on expensive HD maps. Its hierarchical approach allows robots to handle both routine and challenging situations effectively, paving the way for more capable assistive robots.
Original Abstract
Assisting humans in open-world outdoor environments requires robots to translate high-level natural-language intentions into safe, long-horizon, and socially compliant navigation behavior. Existing map-based methods rely on costly pre-built HD maps, while learning-based policies are mostly limited to indoor and short-horizon settings. To bridge this gap, we propose Walk with Me, a map-free framework for long-horizon social navigation from high-level human instructions. Walk with Me leverages GPS context and lightweight candidate points-of-interest from a public map API for semantic destination grounding and waypoint proposal. A High-Level Vision-Language Model grounds abstract instructions into concrete destinations and plans coarse waypoint sequences. During execution, an observation-aware routing mechanism determines whether the Low-Level Vision-Language-Action policy can handle the current situation or whether explicit safety reasoning from the High-Level VLM is needed. Routine segments are executed by the Low-Level VLA, while complex situations such as crowded crossings trigger high-level reasoning and stop-and-wait behavior when unsafe. By combining semantic intent grounding, map-free long-horizon planning, safety-aware reasoning, and low-level action generation, Walk with Me enables practical outdoor social navigation for human-centric assistance.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.