ArXiv TLDR

AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation

🐦 Tweet
2604.26917

Zijie Wu, Chaohui Yu, Fan Wang, Xiang Bai

cs.CV

TLDR

AnimateAnyMesh++ is a flexible 4D foundation model for high-fidelity, text-driven animation of arbitrary 3D meshes, improving quality and efficiency.

Key contributions

  • Expanded DyMesh-XL dataset to 300K identities, boosting category and motion diversity.
  • Redesigned DyMeshVAE-Flex with topology-aware attention and vertex-normal features for better geometry.
  • Introduced variable-length training/generation for longer, high-fidelity mesh animations.
  • Generates semantically accurate and temporally coherent mesh animations rapidly.

Why it matters

Creating high-quality animated 3D models is challenging due to complex spatio-temporal modeling and scarce 4D data. This paper addresses these issues by providing a flexible, efficient, and high-fidelity solution for text-driven mesh animation, significantly advancing 4D content generation.

Original Abstract

Recent advances in 4D content generation have attracted increasing attention, yet creating high-quality animated 3D models remains challenging due to the complexity of modeling spatio-temporal distributions and the scarcity of 4D training data. We present AnimateAnyMesh++, a feed-forward framework for text-driven animation of arbitrary 3D meshes with substantial upgrades in data, architecture, and generative capability. First, we expand the DyMesh-XL dataset by mining dynamic content from Objaverse-XL, increasing the number of unique identities from 60K to 300K and substantially broadening category and motion diversity. Second, we redesign DyMeshVAE-Flex with power-law topology-aware attention and vertex-normal enhanced features, which significantly improves trajectory reconstruction, local geometry preservation, and mitigates trajectory-sticking artifacts. Third, we introduce architectural changes to both DyMeshVAE-Flex and the rectified-flow (RF) generator to support variable-length sequence training and generation, enabling longer animations while preserving reconstruction fidelity. Extensive experiments demonstrate that AnimateAnyMesh++ generates semantically accurate and temporally coherent mesh animations within seconds, surpassing prior approaches in quality and efficiency. The enlarged DyMesh-XL, the upgraded DyMeshVAE-Flex, and variable-length RF together deliver consistent gains across benchmarks and in-the-wild meshes. We will release code, models, and the expanded DyMesh-XL upon acceptance of this manuscript to facilitate research in 4D content creation.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.