Bridge: Basis-Driven Causal Inference Marries VFMs for Domain Generalization
Mingbo Hong, Feng Liu, Caroline Gevaert, George Vosselman, Hao Cheng
TLDR
Bridge enhances object detection domain generalization by using basis-driven causal inference to block confounders and refine representations.
Key contributions
- Proposes "Bridge," a novel basis-driven framework for domain generalization in object detection.
- Uses causal inference with low-rank bases for front-door adjustment to block confounders and mitigate spurious correlations.
- Refines representations by filtering redundant and task-irrelevant components for better generalization.
- Seamlessly integrates with Vision Foundation Models (VFMs) like DINOv2/3, SAM, and Stable Diffusion.
Why it matters
Domain generalization in object detection is crucial but challenging due to spurious correlations from confounders. Bridge offers a robust solution by leveraging causal inference and VFMs, significantly improving performance across diverse real-world scenarios, advancing reliable AI deployment.
Original Abstract
Detectors often suffer from degraded performance, primarily due to the distributional gap between the source and target domains. This issue is especially evident in single-source domains with limited data, as models tend to rely on confounders (e.g., illumination, co-occurrence, and style) from the source domain, leading to spurious correlations that hinder generalization. To this end, this paper proposes a novel Basis-driven framework for domain generalization, namely \textbf{\textit{Bridge}}, that incorporates causal inference into object detection. By learning the low-rank bases for front-door adjustment, \textbf{\textit{Bridge}} blocks confounders' effects to mitigate spurious correlations, while simultaneously refining representations by filtering redundant and task-irrelevant components. \textbf{\textit{Bridge}} can be seamlessly integrated with both discriminative (e.g., DINOv2/3, SAM) and generative (e.g., Stable Diffusion) Vision Foundation Models (VFMs). Extensive experiments across multiple domain generalization object detection datasets, i.e., Cross-Camera, Adverse Weather, Real-to-Artistic, Diverse Weather Datasets, and Diverse Weather DroneVehicle (our newly augmented real-world UAV-based benchmark), underscore the superiority of our proposed method over previous state-of-the-art approaches. The project page is available at: https://mingbohong.github.io/Bridge/.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.