Yizhou Wang

3 papers · Latest: April 24, 2026

GazeVLA: Learning Human Intention for Robotic Manipulation

GazeVLA uses human gaze as an intention proxy to bridge the human-robot embodiment gap, improving robotic manipulation with less robot data.

This survey categorizes and analyzes hallucinations in Video LLMs, detailing their types, causes, evaluation, and mitigation strategies.

Visually-grounded Humanoid Agents enable autonomous digital humans to perceive, reason, and act in novel 3D environments using visual observations.

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.