Shanghang Zhang
5 papers ยท Latest:
HarmoWAM: Harmonizing Generalizable and Precise Manipulation via Adaptive World Action Models
HarmoWAM unifies predictive and reactive control in robot manipulation, achieving both generalizable transit and precise interaction through adaptive expert coordination.
VEGA: Visual Encoder Grounding Alignment for Spatially-Aware Vision-Language-Action Models
VEGA enhances VLA models' spatial reasoning by directly aligning their visual encoder outputs with 3D-aware features, improving robotic manipulation.
LaST-R1: Reinforcing Action via Adaptive Physical Latent Reasoning for VLA Models
LaST-R1 enhances VLA models with adaptive latent physical reasoning and a new RL algorithm, LAPO, achieving near-perfect robotic manipulation.
Hi-WM: Human-in-the-World-Model for Scalable Robot Post-Training
Hi-WM enables scalable robot post-training by allowing human intervention directly within a learned world model, reducing real-world execution needs.
Mask World Model: Predicting What Matters for Robust Robot Policy Learning
Mask World Model predicts semantic masks instead of pixels for robust robot policy learning, outperforming RGB-based world models.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.