Yazhou Xing
2 papers ยท Latest:
Computer Vision
Audio-Visual Intelligence in Large Foundation Models
This survey provides the first comprehensive review of Audio-Visual Intelligence (AVI) in large foundation models, unifying tasks, methods, and challenges.
2605.04045
Computer VisionAnimationBench: Are Video Models Good at Character-Centric Animation?
AnimationBench is a new benchmark for evaluating image-to-video models' ability to generate character-centric animation, addressing limitations of realism-focused tools.
2604.15299
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.