Robust Fusion of Object-Level V2X for Learned 3D Object Detection
Lukas Ostendorf, Lennart Reiher, Onn Haran, Lutz Eckstein
TLDR
This paper proposes a robust V2X fusion method for 3D object detection in autonomous driving, improving performance and resilience to real-world V2X imperfections.
Key contributions
- Integrates object-level V2X information into 3D object detection systems.
- Emulates realistic V2X imperfections like noise, latency, and dropout using the nuScenes dataset.
- Fuses V2X data into a BEVFusion-style detector via a dedicated bird's-eye view (BEV) input.
- Introduces a noise-aware training strategy with explicit confidence encoding for enhanced robustness.
Why it matters
Onboard sensors for autonomous driving are limited by line-of-sight. V2X communication offers a solution, but its real-world imperfections can hinder performance. This work provides a robust fusion strategy, making V2X a more reliable complement to onboard perception for safer autonomous systems.
Original Abstract
Perception for automated driving is largely based on onboard environmental sensors, such as cameras and radar, which are cost-effective but limited by line-of-sight and field-of-view constraints. These inherent limitations may cause onboard perception to fail under occlusions or poor visibility conditions. In parallel, cooperative awareness via vehicle-to-everything (V2X) communication is becoming increasingly available, enabling vehicles and infrastructure to share their own state as object-level information that complements onboard perception. In this work, we study how such V2X information can be integrated into 3D object detection and how robust the resulting system is to realistic V2X imperfections. Using the nuScenes dataset, we emulate object-level cooperative awareness messages from ground truth, injecting controlled noise and object dropout to mimic real-world conditions such as latency, localization errors, and low V2X penetration rates. We convert these messages into a dedicated bird's-eye view (BEV) input and fuse them into a BEVFusion-style detector. Our results demonstrate that while object-level cooperative information can substantially improve detection performance, achieving an NDS of 0.80 under favorable conditions, models trained on idealized data become fragile and over-reliant on V2X. Conversely, our proposed noise-aware training strategy, coupled with explicit confidence encoding, enhances robustness, maintaining performance gains even under severe noise and reduced V2X penetration.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.