Junbin Xiao

2 papers · Latest: May 5, 2026

Audio-Visual Intelligence in Large Foundation Models

This survey provides the first comprehensive review of Audio-Visual Intelligence (AVI) in large foundation models, unifying tasks, methods, and challenges.

2605.04045May 5, 2026

Computer Vision

Ego-Grounding for Personalized Question-Answering in Egocentric Videos

This paper introduces MyEgo, a new egocentric video QA dataset, revealing that current MLLMs struggle with personalized ego-grounding and long-term memory.

2604.01966Apr 2, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.