Hangjun Ye

5 papers · Latest: May 1, 2026

Thinking in Text and Images: Interleaved Vision--Language Reasoning Traces for Long-Horizon Robot Manipulation

IVLR introduces an interleaved vision-language reasoning trace for long-horizon robot manipulation, achieving high success on complex tasks.

2605.00438May 1, 2026

Robotics

Walk With Me: Long-Horizon Social Navigation for Human-Centric Outdoor Assistance

Walk with Me is a map-free framework enabling robots to perform safe, long-horizon social navigation outdoors using high-level human instructions.

2604.26839Apr 29, 2026

Computer Vision

OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation

OneVL introduces a unified VLA and World Model framework, achieving state-of-the-art latent Chain-of-Thought reasoning at real-time speed.

2604.18486Apr 20, 2026

Computer Vision

XEmbodied: A Foundation Model with Enhanced Geometric and Physical Cues for Large-Scale Embodied Environments

XEmbodied is a foundation model that enhances VLMs with intrinsic 3D geometric and physical awareness for robust performance in embodied environments.

2604.18484Apr 20, 2026

Computer Vision

UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving

UniDriveVLA unifies autonomous driving tasks by decoupling perception and reasoning with expert Mixture-of-Transformers, achieving SOTA performance.

2604.02190Apr 2, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.