ArXiv TLDR

EvoGround: Self-Evolving Video Agents for Video Temporal Grounding

🐦 Tweet
2605.13803

Minjoon Jung, Byoung-Tak Zhang, Lorenzo Torresani

cs.CV

TLDR

EvoGround introduces self-evolving agents for video temporal grounding, achieving state-of-the-art results without human-labeled data.

Key contributions

  • Introduces EvoGround, a framework for video temporal grounding using two self-evolving agents.
  • Learns from raw, unlabeled videos via a self-reinforcing proposer-solver reinforcement learning loop.
  • Achieves performance comparable to or better than fully supervised models on VTG benchmarks.
  • Emerges as a state-of-the-art fine-grained video captioner without requiring manual labels.

Why it matters

This paper addresses the significant challenge of data scarcity in video temporal grounding by eliminating the need for costly human annotations. Its self-supervised approach demonstrates that high performance can be achieved using only raw video data. This breakthrough could accelerate research and deployment of video understanding technologies.

Original Abstract

Video temporal grounding (VTG) takes an untrimmed video and a natural-language query as input and localizes the temporal moment that best matches the query. Existing methods rely on large, task-specific datasets requiring costly manual annotation. We introduce EvoGround, a framework of two coupled self-evolving agents, a proposer and a solver, that learn temporal grounding from raw videos without any human-labeled data. The proposer generates query--moment pairs from raw videos, while the solver learns to ground them and feeds back signals that improve the proposer in return. Through this self-reinforcing reinforcement-learning loop, the two agents are initialized from the same backbone and mutually improve across iterations. Trained on 2.5K unlabeled videos, EvoGround matches or surpasses fully supervised models across multiple VTG benchmarks, while emerging as a state-of-the-art fine-grained video captioner without manual labels.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.