Yao Zhao

3 papers · Latest: May 1, 2026

Let ViT Speak: Generative Language-Image Pre-training

GenLIP is a simple, scalable generative pre-training framework that enables Vision Transformers to directly predict language tokens, achieving strong multimodal performance.

2605.00809May 1, 2026

Statistical Machine Learning

A novel hybrid approach for positive-valued DAG learning

H-MRS is a novel algorithm for learning causal DAGs from positive-valued data by combining moment-based scoring with log-scale regression.

2604.08935Apr 10, 2026

Natural Language Processing

Gemini: A Family of Highly Capable Multimodal Models

Gemini is a new family of multimodal AI models excelling in image, audio, video, and text understanding, achieving state-of-the-art results across numerous benchmarks including human-expert level on MMLU.

2312.11805Dec 19, 2023

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.