PriorVLA: Prior-Preserving Adaptation for Vision-Language-Action Models
Xinyu Guo, Bin Xie, Wei Chai, Xianchi Deng, Tiancai Wang + 2 more
TLDR
PriorVLA adapts Vision-Language-Action models by preserving broad pretrained priors, using a frozen expert and an adaptation expert for superior performance.
Key contributions
- Introduces PriorVLA, a framework preserving pretrained priors for VLA model adaptation.
- Utilizes a frozen Prior Expert and a specialized Adaptation Expert for downstream tasks.
- Expert Queries integrate scene and motor priors to guide effective adaptation.
- Outperforms full fine-tuning and baselines, especially in out-of-distribution and few-shot settings.
Why it matters
Current VLA model adaptation often loses broad pretrained knowledge. PriorVLA addresses this by preserving priors, leading to more robust and efficient adaptation. This is crucial for developing generalist robots that perform well across diverse, novel tasks with limited data.
Original Abstract
Large-scale pretraining has made Vision-Language-Action (VLA) models promising foundations for generalist robot manipulation, yet adapting them to downstream tasks remains necessary. However, the common practice of full fine-tuning treats pretraining as initialization and can shift broad priors toward narrow training-distribution patterns. We propose PriorVLA, a novel framework that preserves pretrained priors and learns to leverage them for effective adaptation. PriorVLA keeps a frozen Prior Expert as a read-only prior source and trains an Adaptation Expert for downstream specialization. Expert Queries capture scene priors from the pretrained VLM and motor priors from the Prior Expert, integrating both into the Adaptation Expert to guide adaptation. Together, PriorVLA updates only 25% of the parameters updated by full fine-tuning. Across RoboTwin 2.0, LIBERO, and real-world tasks, PriorVLA achieves stronger overall performance than full fine-tuning and state-of-the-art VLA baselines, with the largest gains under out-of-distribution (OOD) and few-shot settings. PriorVLA improves over pi0.5 by 11 points on RoboTwin 2.0-Hard and achieves 99.1% average success on LIBERO. Across eight real-world tasks and two embodiments, PriorVLA reaches 81% in-distribution (ID) and 57% OOD success with standard data. With only 10 demonstrations per task, PriorVLA reaches 48% ID and 32% OOD success, surpassing pi0.5 by 24 and 22 points, respectively.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.