Yilun Chen

3 papers · Latest: April 23, 2026

VistaBot: View-Robust Robot Manipulation via Spatiotemporal-Aware View Synthesis

VistaBot enhances robot manipulation's view robustness by combining geometric models with video diffusion for closed-loop control.

2604.21914Apr 23, 2026

Robotics

PokeVLA: Empowering Pocket-Sized Vision-Language-Action Model with Comprehensive World Knowledge Guidance

PokeVLA is a lightweight Vision-Language-Action model that improves robot manipulation by integrating comprehensive world knowledge and spatial awareness.

2604.20834Apr 22, 2026

Robotics

StarVLA-$α$: Reducing Complexity in Vision-Language-Action Systems

StarVLA-$α$ simplifies Vision-Language-Action systems, offering a strong baseline that reduces complexity and achieves competitive performance across diverse benchmarks.

2604.11757Apr 13, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.