Ping Luo

4 papers · Latest: May 13, 2026

AttenA+: Rectifying Action Inequality in Robotic Foundation Models

AttenA+ rectifies action inequality in robotic foundation models by prioritizing kinematically critical, low-velocity segments for improved manipulation.

2605.13548May 13, 2026

Computer Vision

Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation

Tuna-2 is a unified multimodal model using pixel embeddings for understanding and generation, outperforming vision encoders and simplifying architecture.

2604.24763Apr 27, 2026

Computer Vision

HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System

HiVLA is a hierarchical robot manipulation system that decouples VLM planning from motor control, improving long-horizon and fine-grained tasks.

2604.14125Apr 15, 2026

Computer Vision

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

InternVL is a 6-billion parameter vision-language foundation model that aligns large-scale vision models with LLMs to achieve state-of-the-art results across diverse visual-linguistic tasks.

2312.14238Dec 21, 2023

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.