Hangjun Ye
5 papers ยท Latest:
Thinking in Text and Images: Interleaved Vision--Language Reasoning Traces for Long-Horizon Robot Manipulation
IVLR introduces an interleaved vision-language reasoning trace for long-horizon robot manipulation, achieving high success on complex tasks.
Walk With Me: Long-Horizon Social Navigation for Human-Centric Outdoor Assistance
Walk with Me is a map-free framework enabling robots to perform safe, long-horizon social navigation outdoors using high-level human instructions.
OneVL: One-Step Latent Reasoning and Planning with Vision-Language Explanation
OneVL introduces a unified VLA and World Model framework, achieving state-of-the-art latent Chain-of-Thought reasoning at real-time speed.
XEmbodied: A Foundation Model with Enhanced Geometric and Physical Cues for Large-Scale Embodied Environments
XEmbodied is a foundation model that enhances VLMs with intrinsic 3D geometric and physical awareness for robust performance in embodied environments.
UniDriveVLA: Unifying Understanding, Perception, and Action Planning for Autonomous Driving
UniDriveVLA unifies autonomous driving tasks by decoupling perception and reasoning with expert Mixture-of-Transformers, achieving SOTA performance.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.