Yonghui Wu
4 papers ยท Latest:
MARCH: Multi-Agent Radiology Clinical Hierarchy for CT Report Generation
MARCH is a multi-agent AI framework that mimics radiology department hierarchy to generate more accurate and reliable CT reports.
Seedance 2.0: Advancing Video Generation for World Complexity
Seedance 2.0 is a new multi-modal audio-video generation model with a unified architecture, offering advanced capabilities and improved performance.
Gemini: A Family of Highly Capable Multimodal Models
Gemini is a new family of multimodal AI models excelling in image, audio, video, and text understanding, achieving state-of-the-art results across numerous benchmarks including human-expert level on MMLU.
Tacotron: Towards End-to-End Speech Synthesis
Tacotron is an end-to-end text-to-speech model that synthesizes natural-sounding speech directly from text characters without requiring complex intermediate components.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.