Bridge: Basis-Driven Causal Inference Marries VFMs for Domain Generalization

April 29, 20262604.26820

Mingbo Hong, Feng Liu, Caroline Gevaert, George Vosselman, Hao Cheng

cs.CV

TLDR

Bridge enhances object detection domain generalization by using basis-driven causal inference to block confounders and refine representations.

Key contributions

Proposes "Bridge," a novel basis-driven framework for domain generalization in object detection.
Uses causal inference with low-rank bases for front-door adjustment to block confounders and mitigate spurious correlations.
Refines representations by filtering redundant and task-irrelevant components for better generalization.
Seamlessly integrates with Vision Foundation Models (VFMs) like DINOv2/3, SAM, and Stable Diffusion.

Why it matters

Domain generalization in object detection is crucial but challenging due to spurious correlations from confounders. Bridge offers a robust solution by leveraging causal inference and VFMs, significantly improving performance across diverse real-world scenarios, advancing reliable AI deployment.

Original Abstract

Detectors often suffer from degraded performance, primarily due to the distributional gap between the source and target domains. This issue is especially evident in single-source domains with limited data, as models tend to rely on confounders (e.g., illumination, co-occurrence, and style) from the source domain, leading to spurious correlations that hinder generalization. To this end, this paper proposes a novel Basis-driven framework for domain generalization, namely \textbf{\textit{Bridge}}, that incorporates causal inference into object detection. By learning the low-rank bases for front-door adjustment, \textbf{\textit{Bridge}} blocks confounders' effects to mitigate spurious correlations, while simultaneously refining representations by filtering redundant and task-irrelevant components. \textbf{\textit{Bridge}} can be seamlessly integrated with both discriminative (e.g., DINOv2/3, SAM) and generative (e.g., Stable Diffusion) Vision Foundation Models (VFMs). Extensive experiments across multiple domain generalization object detection datasets, i.e., Cross-Camera, Adverse Weather, Real-to-Artistic, Diverse Weather Datasets, and Diverse Weather DroneVehicle (our newly augmented real-world UAV-based benchmark), underscore the superiority of our proposed method over previous state-of-the-art approaches. The project page is available at: https://mingbohong.github.io/Bridge/.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers