Cross-Modal Phantom: Coordinated Camera-LiDAR Spoofing Against Multi-Sensor Fusion in Autonomous Vehicles

April 23, 20262604.21841

cs.CR

TLDR

A coordinated camera-LiDAR spoofing attack can deceive autonomous vehicle multi-sensor fusion by creating false cross-modal consistency.

Key contributions

Investigates a novel vulnerability in multi-sensor fusion by fabricating cross-sensor consistency.
Designs a coordinated data-level attack emulating IR projection for cameras and LiDAR signal injection.
Simulates sensor-level outcomes with perspective-aware image patches and synthetic 3D LiDAR point clusters.
Achieves 85.5% attack success rate against a state-of-the-art perception model using 400 KITTI scenes.

Why it matters

This paper reveals a critical, previously underexplored vulnerability in autonomous vehicle multi-sensor fusion. By demonstrating that coordinated cross-modal spoofing can bypass redundancy, it highlights a fundamental flaw in AV perception systems. This work is crucial for developing more robust and secure autonomous driving technologies.

Original Abstract

Autonomous Vehicles (AVs) increasingly depend on Multi-Sensor Fusion (MSF) to combine complementary modalities such as cameras and LiDAR for robust perception. While this redundancy is intended to safeguard against single-sensor failures, the fusion process itself introduces a subtle and underexplored vulnerability. In this work, we investigate whether an attacker can bypass MSF's redundancy by fabricating cross-sensor consistency, making multiple sensors agree on the same false object. We design a coordinated, data-level (early-fusion) attack that emulates the outcome of two synchronized physical spoofing sources: an infrared (IR) projection that induces a false camera detection and a LiDAR signal injection that produces a matching 3D point cluster. Rather than implementing the physical attack hardware, we simulate its sensor-level outcomes by inserting perspective-aware image patches and synthetic LiDAR point clusters aligned in 3D space. This approach preserves the perceptual effects that real IR and IEMI-based spoofing would create at the sensor output. Using 400 KITTI scenes, our large-scale evaluation shows that the coordinated spoofing deceives a state-of-the-art perception model with an 85.5% successful attack rate. These findings provide the first quantitative evidence that malicious cross-modal consistency can compromise MSF-based perception, revealing a critical vulnerability in the core data-fusion logic of modern autonomous vehicle systems.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers