Tao Xiang
2 papers ยท Latest:
Computer Vision
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
Tuna-2 is a unified multimodal model using pixel embeddings for understanding and generation, outperforming vision encoders and simplifying architecture.
2604.24763
Computer VisionRays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories
Rays as Pixels is a Video Diffusion Model that jointly learns to generate videos and predict camera trajectories, improving robustness in sparse data.
2604.09429
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.