ArXiv TLDR

DETOUR: A Practical Backdoor Attack against Object Detection

🐦 Tweet
2604.24599

Dazhuang Liu, Yanqi Qiao, Rui Wang, Kaitai Liang, Georgios Smaragdakis

cs.CR

TLDR

DETOUR introduces a practical backdoor attack on object detection models using semantic, viewpoint-invariant triggers effective across diverse real-world conditions.

Key contributions

  • Existing attacks are limited by fixed, minimal triggers, lacking real-world practicality.
  • Discovers the "trigger radiating effect" (TRE) where patch triggers activate backdoors across neighboring locations.
  • Introduces DETOUR, a practical backdoor attack using semantic triggers (e.g., a mug) for real-world systems.
  • Employs trigger rescaling and multi-location/FoV insertion for spatial and viewpoint-invariant activation.

Why it matters

Existing backdoor attacks on object detection lack real-world practicality. DETOUR addresses this by creating a robust, semantic, and viewpoint-invariant attack, exposing critical vulnerabilities in real-world vision systems.

Original Abstract

Object detection (OD) is critical to real-world vision systems, yet existing backdoor attacks on detection transformers (DETRs) for OD tasks rely on patch-wise triggers optimized at fixed locations with minimal perturbations. Such attacks overlook that backdoor triggers in the real world may appear at different sizes, fields of view (FoVs), and locations in images, while minimal perturbations are difficult for cameras to capture, limiting attack practicality. We first observe that a patch-wise trigger in DETR delivers high attack effectiveness when activating the backdoor across neighboring locations, a phenomenon we term the trigger radiating effect (TRE). Meanwhile, inserting patch-wise triggers across multiple locations synergistically enhances TRE, resulting in high attack effectiveness across images. We propose DETOUR, a practical backdoor attack by using semantic triggers that are effective in real-world object detection systems. To ensure attack practicality, we rescale trigger patterns to different sizes and insert them at various predefined locations during backdoor training, enabling the model to recognize the trigger regardless of its spatial configurations. To address FoV variations in physical deployments, we extract the trigger pattern from a real-world object (e.g., a mug) captured under multiple FoVs and inject the trigger accordingly, promoting viewpoint-invariant backdoor activation and enhancing TRE across the entire image. As a result, the backdoor can be reliably activated under diverse FoVs and spatial configurations.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.