MOMO: A framework for seamless physical, verbal, and graphical robot skill learning and adaptation

April 22, 20262604.20468

Markus Knauer, Edoardo Fiorini, Maximilian Mühlbauer, Stefan Schneyer, Promwat Angsuratanawech + 8 more

cs.ROcs.AIcs.CLcs.HCcs.LG

TLDR

MOMO is a robot framework enabling non-expert users to adapt skills using kinesthetic touch, natural language, and a graphical interface for industrial tasks.

Key contributions

Enables robot skill adaptation via kinesthetic touch, natural language, and a graphical web interface.
Features a tool-based LLM architecture for safe, high-level natural language skill modification.
Integrates KMPs, probabilistic Virtual Fixtures, and ergodic control for diverse tasks.
Validated on a 7-DoF robot, demonstrating practical applicability in industrial settings.

Why it matters

This paper addresses the critical need for flexible industrial robots that non-experts can easily adapt. By combining physical, verbal, and graphical interfaces, it simplifies complex robot programming. The tool-based LLM architecture is a key innovation, enabling safer, generalized voice-commanded skill adaptation for various tasks like surface finishing.

Original Abstract

Industrial robot applications require increasingly flexible systems that non-expert users can easily adapt for varying tasks and environments. However, different adaptations benefit from different interaction modalities. We present an interactive framework that enables robot skill adaptation through three complementary modalities: kinesthetic touch for precise spatial corrections, natural language for high-level semantic modifications, and a graphical web interface for visualizing geometric relations and trajectories, inspecting and adjusting parameters, and editing via-points by drag-and-drop. The framework integrates five components: energy-based human-intention detection, a tool-based LLM architecture (where the LLM selects and parameterizes predefined functions rather than generating code) for safe natural language adaptation, Kernelized Movement Primitives (KMPs) for motion encoding, probabilistic Virtual Fixtures for guided demonstration recording, and ergodic control for surface finishing. We demonstrate that this tool-based LLM architecture generalizes skill adaptation from KMPs to ergodic control, enabling voice-commanded surface finishing. Validation on a 7-DoF torque-controlled robot at the Automatica 2025 trade fair demonstrates the practical applicability of our approach in industrial settings.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers