ArXiv TLDR

CurEvo: Curriculum-Guided Self-Evolution for Video Understanding

🐦 Tweet
2604.26707

Guiyi Zeng, Junqing Yu, Yi-Ping Phoebe Chen, Xu Chen, Wei Yang + 1 more

cs.CVcs.LG

TLDR

CurEvo introduces curriculum-guided self-evolution for video understanding, dynamically regulating task difficulty and data diversity for structured learning.

Key contributions

  • Introduces CurEvo, a curriculum-guided self-evolution framework for video understanding.
  • Dynamically regulates task difficulty, evaluation criteria, and data diversity based on model competence.
  • Employs a multi-dimensional adaptive QA framework for evolving question generation and answer evaluation.
  • Achieves consistent improvements in accuracy and semantic scores across four VideoQA benchmarks.

Why it matters

This paper addresses the limitations of weakly controlled self-evolution in autonomous video understanding. By introducing structured curriculum guidance, CurEvo enables more progressive and effective model improvement, leading to better performance on VideoQA tasks without human annotations.

Original Abstract

Recent advances in self-evolution video understanding frameworks have demonstrated the potential of autonomous learning without human annotations. However, existing methods often suffer from weakly controlled optimization and uncontrolled difficulty progression, as they lack structured guidance throughout the iterative learning process. To address these limitations, we propose CurEvo, a curriculum-guided self-evolution framework that introduces curriculum learning into self-evolution to achieve more structured and progressive model improvement. CurEvo dynamically regulates task difficulty, refines evaluation criteria, and balances data diversity according to model competence, forming a curriculum-guided feedback loop that aligns learning complexity with model capability. Built upon this principle, we develop a multi-dimensional adaptive QA framework that jointly evolves question generation and answer evaluation across perception, recognition, and understanding dimensions, ensuring coherent and measurable curriculum progression. Through this integration, CurEvo transforms weakly controlled self-evolution into a more structured learning process for autonomous video understanding. Across seven backbones, CurEvo consistently improves both benchmark accuracy and evaluator-based semantic score on four VideoQA benchmarks, validating the effectiveness of curriculum-guided self-evolution for video understanding.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.