Ping Luo
4 papers ยท Latest:
AttenA+: Rectifying Action Inequality in Robotic Foundation Models
AttenA+ rectifies action inequality in robotic foundation models by prioritizing kinematically critical, low-velocity segments for improved manipulation.
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
Tuna-2 is a unified multimodal model using pixel embeddings for understanding and generation, outperforming vision encoders and simplifying architecture.
HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System
HiVLA is a hierarchical robot manipulation system that decouples VLM planning from motor control, improving long-horizon and fine-grained tasks.
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
InternVL is a 6-billion parameter vision-language foundation model that aligns large-scale vision models with LLMs to achieve state-of-the-art results across diverse visual-linguistic tasks.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.