Yonghui Wu

4 papers · Latest: April 17, 2026

MARCH: Multi-Agent Radiology Clinical Hierarchy for CT Report Generation

MARCH is a multi-agent AI framework that mimics radiology department hierarchy to generate more accurate and reliable CT reports.

2604.16175Apr 17, 2026

Computer Vision

Seedance 2.0: Advancing Video Generation for World Complexity

Seedance 2.0 is a new multi-modal audio-video generation model with a unified architecture, offering advanced capabilities and improved performance.

2604.14148Apr 15, 2026

Natural Language Processing

Gemini: A Family of Highly Capable Multimodal Models

Gemini is a new family of multimodal AI models excelling in image, audio, video, and text understanding, achieving state-of-the-art results across numerous benchmarks including human-expert level on MMLU.

2312.11805Dec 19, 2023

Natural Language Processing

Tacotron: Towards End-to-End Speech Synthesis

Tacotron is an end-to-end text-to-speech model that synthesizes natural-sounding speech directly from text characters without requiring complex intermediate components.

1703.10135Mar 29, 2017

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.