ArXiv TLDR

FASTER: Value-Guided Sampling for Fast RL

🐦 Tweet
2604.19730

Perry Dong, Alexander Swerdlow, Dorsa Sadigh, Chelsea Finn

cs.LGcs.AI

TLDR

FASTER introduces a value-guided sampling method for diffusion-based RL policies, significantly reducing computational cost while maintaining performance.

Key contributions

  • Introduces FASTER, a value-guided sampling method for efficient diffusion-based RL policies.
  • Models action candidate denoising as an MDP to progressively filter low-value options.
  • Learns a policy and value function to predict and select high-performing action candidates early.
  • Significantly reduces compute for generative RL, improving performance on manipulation tasks.

Why it matters

This paper tackles the high computational cost of sampling-based methods in modern RL. FASTER provides a practical solution by intelligently filtering action candidates early, making advanced generative RL algorithms more efficient. This significantly broadens their applicability for complex real-world tasks.

Original Abstract

Some of the most performant reinforcement learning algorithms today can be prohibitively expensive as they use test-time scaling methods such as sampling multiple action candidates and selecting the best one. In this work, we propose FASTER, a method for getting the benefits of sampling-based test-time scaling of diffusion-based policies without the computational cost by tracing the performance gain of action samples back to earlier in the denoising process. Our key insight is that we can model the denoising of multiple action candidates and selecting the best one as a Markov Decision Process (MDP) where the goal is to progressively filter action candidates before denoising is complete. With this MDP, we can learn a policy and value function in the denoising space that predicts the downstream value of action candidates in the denoising process and filters them while maximizing returns. The result is a method that is lightweight and can be plugged into existing generative RL algorithms. Across challenging long-horizon manipulation tasks in online and batch-online RL, FASTER consistently improves the underlying policies and achieves the best overall performance among the compared methods. Applied to a pretrained VLA, FASTER achieves the same performance while substantially reducing training and inference compute requirements. Code is available at https://github.com/alexanderswerdlow/faster .

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.