Yao Zhao
3 papers ยท Latest:
Computer Vision
Let ViT Speak: Generative Language-Image Pre-training
GenLIP is a simple, scalable generative pre-training framework that enables Vision Transformers to directly predict language tokens, achieving strong multimodal performance.
2605.00809
Statistical Machine LearningA novel hybrid approach for positive-valued DAG learning
H-MRS is a novel algorithm for learning causal DAGs from positive-valued data by combining moment-based scoring with log-scale regression.
2604.08935
Natural Language ProcessingGemini: A Family of Highly Capable Multimodal Models
Gemini is a new family of multimodal AI models excelling in image, audio, video, and text understanding, achieving state-of-the-art results across numerous benchmarks including human-expert level on MMLU.
2312.11805
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.