Tao Xiang

2 papers · Latest: April 27, 2026

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

Tuna-2 is a unified multimodal model using pixel embeddings for understanding and generation, outperforming vision encoders and simplifying architecture.

2604.24763Apr 27, 2026

Computer Vision

Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories

Rays as Pixels is a Video Diffusion Model that jointly learns to generate videos and predict camera trajectories, improving robustness in sparse data.

2604.09429Apr 10, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.