TMRL: Diffusion Timestep-Modulated Pretraining Enables Exploration for Efficient Policy Finetuning

May 12, 20262605.12236

Matthew M. Hong, Jesse Zhang, Anusha Nagabandi, Abhishek Gupta

cs.ROcs.AIcs.LG

TLDR

TMRL introduces diffusion timestep-modulated pretraining to enable efficient exploration and finetuning of robot policies, improving sample efficiency.

Key contributions

Introduces Context-Smoothed Pre-training (CSP) using diffusion noise for broad action coverage.
Develops Timestep-Modulated RL (TMRL) for dynamic, explicit control over exploration during fine-tuning.
Significantly improves sample efficiency for RL fine-tuning of pre-trained robot policies.
Enables successful real-world finetuning of complex manipulation tasks in under one hour.

Why it matters

Pre-trained robot policies often struggle with exploration during fine-tuning. TMRL unifies BC and RL via diffusion timestep modulation for exploration control. This boosts sample efficiency, enabling real-world robot manipulation quickly.

Original Abstract

Fine-tuning pre-trained robot policies with reinforcement learning (RL) often inherits the bottlenecks introduced by pre-training with behavioral cloning (BC), which produces narrow action distributions that lack the coverage necessary for downstream exploration. We present a unified framework that enables the exploration necessary to enable efficient robot policy finetuning by bridging BC pre-training and RL fine-tuning. Our pre-training method, Context-Smoothed Pre-training (CSP), injects forward-diffusion noise into policy inputs, creating a continuum between precise imitation and broad action coverage. We then fine-tune pre-trained policies via Timestep-Modulated Reinforcement Learning (TMRL), which trains the agent to dynamically adjust this conditioning during fine-tuning by modulating the diffusion timestep, granting explicit control over exploration. Integrating seamlessly with arbitrary policy inputs, e.g., states, 3D point clouds, or image-based VLA policies, we show that TMRL improves RL fine-tuning sample efficiency. Notably, TMRL enables successful real-world fine-tuning on complex manipulation tasks in under one hour. Videos and code available at https://weirdlabuw.github.io/tmrl/.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers