ArXiv TLDR

CKT-WAM: Parameter-Efficient Context Knowledge Transfer Between World Action Models

🐦 Tweet
2605.06247

Yuhua Jiang, Yijun Guo, Hongbing Yang, Guojun Lei, Nuo Chen + 5 more

cs.RO

TLDR

CKT-WAM enables parameter-efficient knowledge transfer between World Action Models by injecting teacher context into student text embeddings.

Key contributions

  • Transfers teacher WAM knowledge via a compact context in the text embedding space.
  • Uses learnable-query cross attention (LQCA) and adapters for efficient context extraction.
  • Achieves 86.1% success on LIBERO-Plus with only 1.17% trainable parameters.
  • Demonstrates 83.3% average success rate on real-world long-horizon manipulation tasks.

Why it matters

This paper addresses the challenge of transferring knowledge between heterogeneous World Action Models. CKT-WAM provides a highly parameter-efficient method, significantly improving generalization and real-world performance with minimal architectural changes. It offers a practical solution for robust embodied control.

Original Abstract

World action models (WAMs) provide a powerful generative framework for embodied control, yet transferring knowledge across heterogeneous WAMs remains challenging due to mismatched latent interfaces, high adaptation cost, and the rigidity of conventional distillation objectives. We propose \textbf{CKT-WAM}, a parameter-efficient \textbf{C}ontext \textbf{K}nowledge \textbf{T}ransfer framework that transfers teacher WAM's knowledge into a student WAM through a compact context in the text embedding space, rather than output imitation or dense hidden-state matching. Specifically, CKT-WAM extracts intermediate teacher hidden states, reduces the number of tokens via compressors' learnable-query cross attention (LQCA), and transforms them through an always-on generalized adapter, a lightweight router, and sparsely activated specialized adapters. The resulting context is then appended to the student's conditioning textual embeddings, thereby injecting the transferred knowledge into the student with minimal architectural modification. Experiments show that CKT-WAM consistently improves zero-shot generalization and achieves the best overall performance on LIBERO-Plus, reaching 86.1\% total success rate with only 1.17\% trainable parameters, while approaching full fine-tuning performance. Beyond simulation, CKT-WAM also demonstrates strong real-world long-horizon manipulation ability, achieving the best average success rate of 83.3\% across four multi-step and long-horizon tasks. Code is available at https://github.com/YuhuaJiang2002/CKT-WAM.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.