ReCoVR: Closing the Loop in Interactive Composed Video Retrieval
Bingqing Zhang, Yi Zhang, Zhuo Cao, Yang Li, Xue Li + 2 more
TLDR
ReCoVR introduces a dual-pathway architecture for interactive composed video retrieval, using reflexive perception to refine search with user feedback and retrieval history.
Key contributions
- Formalizes interactive composed video retrieval (CoVR) for multi-turn, progressive visual search.
- Introduces ReCoVR, a dual-pathway architecture using reflexive perception for diagnostic evidence.
- Employs an Intent Pathway to route diverse feedback to complementary retrieval channels.
- Utilizes a Reflection Pathway to monitor retrieval evolution and correct errors across turns.
Why it matters
This paper addresses the critical limitation of single-round video retrieval by formalizing and implementing interactive, multi-turn search. ReCoVR's reflexive design, incorporating retrieval history and user feedback, significantly improves search accuracy and user experience. This advancement is crucial for real-world progressive visual search applications.
Original Abstract
Composed video retrieval (CoVR) searches for target videos using a reference video and a modification text, but existing methods are restricted to a single interaction round and cannot support the progressive nature of real-world visual search. To bridge this gap, we first formalize interactive composed video retrieval, a multi-turn extension of CoVR, where users progressively refine their search intent through natural-language feedback across turns. Adapting existing interactive retrieval methods to this setting reveals two structural weaknesses: reliance on a single retrieval channel and an open-loop retrieval design that consumes user feedback but does not diagnose whether its own retrieval trajectory is drifting or stagnating. To address these limitations, we propose ReCoVR (Reflexive Composed Video Retrieval), a dual-pathway architecture built on reflexive perception, where the system treats its retrieval history as diagnostic evidence alongside user feedback. Specifically, an Intent Pathway routes heterogeneous feedback to complementary retrieval channels, while a Reflection Pathway performs trajectory-level reflection to monitor result evolution and correct retrieval errors across turns. Experiments on multiple benchmarks show that ReCoVR consistently outperforms interactive baselines, notably achieving 74.30% R@1 after just one interactive round on the WebVid-CoVR-Test dataset.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.