Control Your Queries: Heterogeneous Query Interaction for Camera-Radar Fusion

April 28, 20262604.25574

Jialong Wu, Yihan Wang, Matthias Rottmann

cs.CV

TLDR

ConFusion introduces heterogeneous query interaction for camera-radar 3D object detection, achieving SOTA performance by combining diverse query types.

Key contributions

Proposes ConFusion, a camera-radar 3D object detector using a novel heterogeneous query interaction paradigm.
Combines image, radar, and learnable world queries to improve object coverage and query initialization.
Introduces Heterogeneous Query Mixing (QMix) for dedicated cross-type attention and evidence consolidation.
Presents Interactive Query Swap Sampling (QSwap) to enhance feature sampling through token exchange.

Why it matters

This paper introduces a novel approach to camera-radar fusion, crucial for robust autonomous driving. By intelligently combining diverse query types and interaction mechanisms, it significantly advances 3D object detection performance. This work offers a more effective and cost-efficient sensing solution.

Original Abstract

In autonomous driving, camera-radar fusion offers complementary sensing and low deployment cost. Existing methods perform fusion through input mixing, feature map mixing, or query-based feature sampling. We propose a new fusion paradigm, termed heterogeneous query interaction, and present ConFusion, a camera-radar 3D object detector. ConFusion combines image queries, radar queries, and learnable world queries distributed in 3D space to improve query initialization and object coverage. To encourage cross-type interaction among heterogeneous queries, we introduce heterogeneous query mixing (QMix), which performs dedicated cross-type attention after feature sampling to consolidate complementary object evidence. We further propose interactive query swap sampling (QSwap), which improves feature sampling by allowing related queries to exchange informative feature tokens under attention and geometric constraints. Experiments on the nuScenes dataset show that ConFusion achieves state-of-the-art performance, reaching 59.1 mAP and 65.6 NDS on the validation set, and 61.6 mAP and 67.9 NDS on the test set.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers