CoRAL: Contact-Rich Adaptive LLM-based Control for Robotic Manipulation
Berk Çiçek, Mert K. Er, Özgür S. Öğüz
TLDR
CoRAL enables robots to perform complex contact-rich manipulation tasks by using LLMs to design adaptive cost functions for real-time control.
Key contributions
- CoRAL: A modular framework decoupling LLM reasoning from low-level control for zero-shot, contact-rich manipulation.
- LLMs design adaptive cost functions for a sampling-based motion planner (MPPI), not direct controllers.
- Neuro-symbolic loop: VLM provides dynamic priors, refined by online system identification and LLM feedback.
- Retrieval-based memory unit reuses successful strategies across recurrent tasks, boosting efficiency.
Why it matters
CoRAL addresses the challenge of applying LLMs to contact-rich robotic manipulation. Its adaptive, hierarchical framework significantly boosts success rates in complex, unseen scenarios, effectively bridging high-level AI with real-world physical interaction.
Original Abstract
While Large Language Models (LLMs) and Vision-Language Models (VLMs) demonstrate remarkable capabilities in high-level reasoning and semantic understanding, applying them directly to contact-rich manipulation remains a challenge due to their lack of explicit physical grounding and inability to perform adaptive control. To bridge this gap, we propose CoRAL (Contact-Rich Adaptive LLM-based control), a modular framework that enables zero-shot planning by decoupling high-level reasoning from low-level control. Unlike black-box policies, CoRAL uses LLMs not as direct controllers, but as cost designers that synthesize context-aware objective functions for a sampling-based motion planner (MPPI). To address the ambiguity of physical parameters in visual data, we introduce a neuro-symbolic adaptation loop: a VLM provides semantic priors for environmental dynamics, such as mass and friction estimates, which are then explicitly refined in real time via online system identification, while the LLM iteratively modulates the cost-function structure to correct strategic errors based on interaction feedback. Furthermore, a retrieval-based memory unit allows the system to reuse successful strategies across recurrent tasks. This hierarchical architecture ensures real-time control stability by decoupling high-level semantic reasoning from reactive execution, effectively bridging the gap between slow LLM inference and dynamic contact requirements. We validate CoRAL on both simulation and real-world hardware across challenging and novel tasks, such as flipping objects against walls by leveraging extrinsic contacts. Experiments demonstrate that CoRAL outperforms state-of-the-art VLA and foundation-model-based planner baselines by boosting success rates over 50% on average in unseen contact-rich scenarios, effectively handling sim-to-real gaps through its adaptive physical understanding.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.