MemOVCD: Training-Free Open-Vocabulary Change Detection via Cross-Temporal Memory Reasoning and Global-Local Adaptive Rectification

April 29, 20262604.26774

Zuzheng Kuang, Honghao Chang, Boqiang Liang, Haoqian Wang, Lijun He + 2 more

cs.CVcs.AI

TLDR

MemOVCD is a training-free open-vocabulary change detection framework using cross-temporal memory reasoning and adaptive rectification for remote sensing images.

Key contributions

Reformulates change detection as a two-frame tracking problem using weighted bidirectional propagation.
Introduces histogram-aligned transition frames to smooth abrupt appearance changes across time.
Applies a global-local adaptive rectification strategy for improved spatial consistency and fine-grained detail.

Why it matters

Existing open-vocabulary change detection methods struggle with temporal coupling and fragmented results. MemOVCD offers a training-free solution that significantly improves accuracy and generalization by addressing these core issues. This advancement makes change detection more robust for diverse remote sensing applications.

Original Abstract

Open-vocabulary change detection aims to identify semantic changes in bi-temporal remote sensing images without predefined categories. Recent methods combine foundation models such as SAM, DINO and CLIP, but typically process each timestamp independently or interact only at the final comparison stage. Such paradigms suffer from insufficient temporal coupling during semantic reasoning, which limits their ability to distinguish genuine semantic changes from non-semantic appearance discrepancies. In addition, patch-dominant inference on high-resolution images often weakens global semantic continuity and produces fragmented change regions. To address these issues, we propose MemOVCD, a training-free open-vocabulary change detection framework based on cross-temporal memory reasoning and global-local adaptive rectification. Specifically, we reformulate bi-temporal change detection as a two-frame tracking problem and introduce weighted bidirectional propagation to aggregate semantic evidence from both temporal directions. To stabilize memory propagation across large temporal gaps, we construct histogram-aligned transition frames to smooth abrupt appearance changes. Moreover, a global-local adaptive rectification strategy adaptively fuses local and global-view predictions, improving spatial consistency while preserving fine-grained details. Experiments on five benchmarks demonstrate that MemOVCD achieves favorable performance on two change detection tasks, validating its effectiveness and generalization under diverse open-vocabulary settings.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers