Xiang Bai

4 papers · Latest: May 13, 2026

Computer Vision

R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow

R-DMesh solves pose misalignment in video-guided 3D animation using a novel VAE and rectification offset for high-fidelity 4D mesh generation.

2605.13838May 13, 2026

Computer Vision

HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation

HERMES++ unifies 3D scene understanding and future geometry prediction in a driving world model, outperforming specialist methods.

2604.28196Apr 30, 2026

Computer Vision

AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation

AnimateAnyMesh++ is a flexible 4D foundation model for high-fidelity, text-driven animation of arbitrary 3D meshes, improving quality and efficiency.

2604.26917Apr 29, 2026

Computer Vision

When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models

NUMINA improves numerical alignment in text-to-video diffusion models by guiding regeneration, boosting counting accuracy and CLIP alignment.

2604.08546Apr 9, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.