GenerativeMPC: VLM-RAG-guided Whole-Body MPC with Virtual Impedance for Bimanual Mobile Manipulation

April 21, 20262604.19522

Marcelino Julio Fernando, Miguel Altamirano Cabrera, Jeffrin Sam, Yara Mahmoud, Konstantin Gubernatorov + 1 more

cs.RO

TLDR

GenerativeMPC uses VLM-RAG to bridge semantic understanding with whole-body MPC for safe, compliant bimanual mobile manipulation.

Key contributions

Introduces GenerativeMPC, a hierarchical cyber-physical framework for bimanual mobile manipulators.
VLM-RAG translates visual/linguistic context into dynamic velocity limits and safety margins for Whole-Body MPC.
VLM-RAG modulates virtual impedance gains for context-aware compliance during human-robot interaction.
Achieves 60% speed reduction near humans and safe, socially-aware navigation and manipulation.

Why it matters

This paper addresses a critical challenge by explicitly linking high-level semantic reasoning with low-level physical control for bimanual mobile manipulation. It enables robots to perform complex tasks safely and compliantly, especially in human-robot interaction, advancing human-centric cybernetics.

Original Abstract

Bimanual mobile manipulation requires a seamless integration between high-level semantic reasoning and safe, compliant physical interaction - a challenge that end-to-end models approach opaquely and classical controllers lack the context to address. This paper presents GenerativeMPC, a hierarchical cyber-physical framework that explicitly bridges semantic scene understanding with physical control parameters for bimanual mobile manipulators. The system utilizes a Vision-Language Model with Retrieval-Augmented Generation (VLM-RAG) to translate visual and linguistic context into grounded control constraints, specifically outputting dynamic velocity limits and safety margins for a Whole-Body Model Predictive Controller (MPC). Simultaneously, the VLM-RAG module modulates virtual stiffness and damping gains for a unified impedance-admittance controller, enabling context-aware compliance during human-robot interaction. Our framework leverages an experience-driven vector database to ensure consistent parameter grounding without retraining. Experimental results in MuJoCo, IsaacSim, and on a physical bimanual platform confirm a 60% speed reduction near humans and safe, socially-aware navigation and manipulation through semantic-to-physical parameter grounding. This work advances the field of human-centric cybernetics by grounding large-scale cognitive models into predictable, high-frequency physical control loops.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers