ArXiv TLDR

Unified Map Prior Encoder for Mapping and Planning

🐦 Tweet
2605.02762

Zongzheng Zhang, Sizhe Zou, Guantian Zheng, Zhenxin Zhu, Yu Gao + 10 more

cs.CV

TLDR

UMPE is a Unified Map Prior Encoder that effectively fuses diverse map priors with BEV features for improved autonomous driving mapping and planning.

Key contributions

  • Introduces UMPE, a unified encoder that fuses heterogeneous map priors (vector, raster, satellite) with BEV features.
  • Employs dedicated vector and raster encoders with SE(2) alignment and confidence-biased fusion for robust prior integration.
  • Significantly improves mapping performance on nuScenes (+5.9 mAP) and Argoverse2 (+4.1 mAP) over strong baselines.
  • Reduces E2E planning trajectory error by 0.30m L2 and collision rate by 0.10% on nuScenes.

Why it matters

Autonomous driving systems often underutilize rich map priors due to their heterogeneity and inconsistent availability. UMPE provides a unified, alignment-aware solution that significantly boosts both mapping accuracy and end-to-end planning safety and efficiency. Its robustness to partial prior availability makes it highly practical.

Original Abstract

Online mapping and end-to-end (E2E) planning in autonomous driving remain largely sensor-centric, leaving rich map priors, including HD/SD vector maps, rasterized SD maps, and satellite imagery, underused because of heterogeneity, pose drift, and inconsistent availability at test time. We present UMPE, a Unified Map Prior Encoder that can ingest any subset of four priors and fuse them with BEV features for both mapping and planning. UMPE has two branches. The vector encoder pre-aligns HD/SD polylines with a frame-wise SE(2) correction, encodes points via multi-frequency sinusoidal features, and produces polyline tokens with confidence scores. BEV queries then apply cross-attention with confidence bias, followed by normalized channel-wise gating to avoid length imbalance and softly down-weight uncertain sources. The raster encoder shares a ResNet-18 backbone conditioned by FiLM with scaling and shift at every stage, performs SE(2) micro-alignment, and injects priors through zero-initialized residual fusion, so the network starts from a do-no-harm baseline and learns to add only useful prior evidence. A vector-then-raster fusion order reflects the inductive bias of geometry first, appearance second. On nuScenes mapping, UMPE lifts MapTRv2 from 61.5 to 67.4 mAP (+5.9) and MapQR from 66.4 to 71.7 mAP (+5.3). On Argoverse2, UMPE adds +4.1 mAP over strong baselines. UMPE is compositional: when trained with all priors, it outperforms single-prior models even when only one prior is available at test time, demonstrating powerset robustness. For E2E planning with the VAD backbone on nuScenes, UMPE reduces trajectory error from 0.72 to 0.42 m L2 on average (-0.30 m) and collision rate from 0.22% to 0.12% (-0.10%), surpassing recent prior-injection methods. These results show that a unified, alignment-aware treatment of heterogeneous map priors yields better mapping and better planning.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.