ArXiv TLDR

AnimationBench: Are Video Models Good at Character-Centric Animation?

🐦 Tweet
2604.15299

Leyi Wu, Pengjun Fang, Kai Sun, Yazhou Xing, Yinwei Wu + 6 more

cs.CV

TLDR

AnimationBench is a new benchmark for evaluating image-to-video models' ability to generate character-centric animation, addressing limitations of realism-focused tools.

Key contributions

  • Introduces AnimationBench, the first systematic benchmark for evaluating animation image-to-video generation.
  • Operationalizes animation principles (e.g., Twelve Basic Principles, IP Preservation) into measurable dimensions.
  • Supports both standardized close-set and flexible open-set evaluation using visual-language models.
  • Exposes animation-specific quality differences overlooked by realism-oriented video benchmarks.

Why it matters

Existing video generation benchmarks struggle with animation's unique stylized appearance and character consistency. AnimationBench provides the first systematic framework to accurately evaluate animation image-to-video models, aligning well with human judgment. This enables more informative assessments, driving progress in character-centric animation.

Original Abstract

Video generation has advanced rapidly, with recent methods producing increasingly convincing animated results. However, existing benchmarks-largely designed for realistic videos-struggle to evaluate animation-style generation with its stylized appearance, exaggerated motion, and character-centric consistency. Moreover, they also rely on fixed prompt sets and rigid pipelines, offering limited flexibility for open-domain content and custom evaluation needs. To address this gap, we introduce AnimationBench, the first systematic benchmark for evaluating animation image-to-video generation. AnimationBench operationalizes the Twelve Basic Principles of Animation and IP Preservation into measurable evaluation dimensions, together with Broader Quality Dimensions including semantic consistency, motion rationality, and camera motion consistency. The benchmark supports both a standardized close-set evaluation for reproducible comparison and a flexible open-set evaluation for diagnostic analysis, and leverages visual-language models for scalable assessment. Extensive experiments show that AnimationBench aligns well with human judgment and exposes animation-specific quality differences overlooked by realism-oriented benchmarks, leading to more informative and discriminative evaluation of state-of-the-art I2V models.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.