COMPASS: COmpact Multi-channel Prior-map And Scene Signature for Floor-Plan-Based Visual Localization
Muhammad Shaheer, Miguel Fernandez-Cortizas, Asier Bikandi-Noya, Holger Voos, Jose Luis Sanchez-Lopez
TLDR
COMPASS uses multi-channel radial descriptors from floor plans and fisheye images for robust visual localization by exploiting geometric and semantic priors.
Key contributions
- Introduces COMPASS, a novel algorithm for visual localization using floor plans with dual fisheye cameras.
- Designs a multi-channel radial descriptor encoding geometric and semantic priors from floor plans.
- Develops a fisheye window detection algorithm to populate visual descriptors for cross-modal matching.
- Demonstrates feasibility of matching wall-window patterns between floor plans and camera views.
Why it matters
This paper addresses a key limitation in visual localization by leveraging both geometric and semantic information from readily available architectural floor plans. By introducing a novel multi-channel descriptor and a fisheye window detection method, COMPASS offers a more robust and practical approach for robot pose estimation.
Original Abstract
Architectural floor plans are widely available priors which contain not only geometry but also the semantic information of the environment, yet existing localization methods largely ignore this semantic information. To address this, we present COMPASS, an algorithm that exploits both geometric and semantic priors from floor plans to estimate the pose of a robot equipped with dual fisheye cameras. Inspired by scan context descriptor from LiDAR-based place recognition, we design a multi-channel radial descriptor that encodes the geometric layout surrounding a position. From the floor plan, rays are cast in 360 azimuth bins and the results are encoded into five channels: normalized range, structural hit type (wall, window, or opening), range gradient, inverse range, and local range variance. From the image side, the same descriptor structure is populated by detecting structural elements in the fisheye imagery. As a first step toward full cross-modal matching, we present a window detection algorithm for fisheye images that uses a line segment detector to identify window frames via vertical edge clustering and brightness verification. Detected windows are projected to azimuthal bearings through the fisheye camera model, producing the hit-type channel of the visual descriptor. As a proof of concept, we generate both descriptors at a single known pose from the Hilti-Trimble SLAM Challenge 2026 dataset and demonstrate that the wall-window pattern extracted from the first frame of each camera closely matches the floor plan descriptor, validating the feasibility of cross-modal structural matching.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.