LLM-Guided Agentic Floor Plan Parsing for Accessible Indoor Navigation of Blind and Low-Vision People
TLDR
This paper introduces an LLM-guided agentic framework that parses floor plans into a knowledge base for accessible indoor navigation for BLV people.
Key contributions
- An agentic framework converts floor plan images into a structured knowledge base for BLV navigation.
- Uses a multi-agent module with self-correction to parse floor plans into a spatial knowledge graph.
- A Path Planner generates accessible instructions, with a Safety Evaluator assessing hazards.
- Outperforms single-call LLM baselines on real-world building navigation tasks.
Why it matters
Indoor navigation is a major challenge for blind and low-vision individuals due to costly existing solutions. This framework offers a scalable, infrastructure-light approach to generate safe, accessible routes from a single floor plan. It significantly improves upon current LLM methods, making indoor spaces more navigable.
Original Abstract
Indoor navigation remains a critical accessibility challenge for the blind and low-vision (BLV) individuals, as existing solutions rely on costly per-building infrastructure. We present an agentic framework that converts a single floor plan image into a structured, retrievable knowledge base to generate safe, accessible navigation instructions with lightweight infrastructure. The system has two phases: a multi-agent module that parses the floor plan into a spatial knowledge graph through a self-correcting pipeline with iterative retry loops and corrective feedback; and a Path Planner that generates accessible navigation instructions, with a Safety Evaluator agent assessing potential hazards along each route. We evaluate the system on the real-world UMBC Math and Psychology building (floors MP-1 and MP-3) and on the CVC-FP benchmark. On MP-1, we achieve success rates of 92.31%, 76.92%, and 61.54% for short, medium, and long routes, outperforming the strongest single-call baseline (Claude 3.7 Sonnet) at 84.62%, 69.23%, and 53.85%. On MP-3, we reach 76.92%, 61.54%, and 38.46%, compared to the best baseline at 61.54%, 46.15%, and 23.08%. These results show consistent gains over single-call LLM baselines and demonstrate that our workflow is a scalable solution for accessible indoor navigation for BLV individuals.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.