Tempered Sequential Monte Carlo for Trajectory and Policy Optimization with Differentiable Dynamics
TLDR
This paper introduces Tempered Sequential Monte Carlo (TSMC), a sampling-based framework for trajectory and policy optimization with differentiable dynamics.
Key contributions
- Proposes Tempered Sequential Monte Carlo (TSMC) for trajectory and policy optimization via inference.
- TSMC uses an annealing scheme with adaptive reweighting and resampling along a tempering path.
- Employs Hamiltonian Monte Carlo (HMC) rejuvenation to maintain diversity and exploit exact gradients.
- Extends TSMC for policy optimization using empirical state approximation and extended-space construction.
Why it matters
This paper offers an efficient sampling-based method for controller design, tackling complex, multimodal distributions in trajectory and policy optimization. By leveraging differentiable dynamics and HMC, TSMC provides a robust approach that outperforms existing baselines, improving control system reliability and performance.
Original Abstract
We propose a sampling-based framework for finite-horizon trajectory and policy optimization under differentiable dynamics by casting controller design as inference. Specifically, we minimize a KL-regularized expected trajectory cost, which yields an optimal "Boltzmann-tilted" distribution over controller parameters that concentrates on low-cost solutions as temperature decreases. To sample efficiently from this sharp, potentially multimodal target, we introduce tempered sequential Monte Carlo (TSMC): an annealing scheme that adaptively reweights and resamples particles along a tempering path from a prior to the target distribution, while using Hamiltonian Monte Carlo rejuvenation to maintain diversity and exploit exact gradients obtained by differentiating through trajectory rollouts. For policy optimization, we extend TSMC via (i) a deterministic empirical approximation of the initial-state distribution and (ii) an extended-space construction that treats rollout randomness as auxiliary variables. Experiments across trajectory- and policy-optimization benchmarks show that TSMC is broadly applicable and compares favorably to state-of-the-art baselines.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.