Agent-Centric Visual Reinforcement Learning under Dynamic Perturbations
Zhengru Fang, Yu Guo, Fei Liu, Yuang Zhang, Yihang Tao + 3 more
TLDR
ACO-MoE robustifies visual RL against dynamic perturbations by using agent-centric restoration experts, achieving near clean performance on a new benchmark.
Key contributions
- Introduces VDCS, a new benchmark for visual RL under non-stationary, Markov-switching perturbations.
- Theoretically proves reconstruction-based methods fail due to entangled perturbation artifacts.
- Proposes ACO-MoE, using agent-centric experts to decouple perception from dynamic perturbations.
- ACO-MoE recovers 95.3% of clean performance on VDCS and achieves SOTA on DMControl generalization.
Why it matters
Visual RL struggles with real-world dynamic perturbations, limiting its practical deployment. This paper provides a new benchmark and a robust solution, ACO-MoE, that significantly improves performance by decoupling perception from noise. This advances the practical applicability of visual RL in complex, unpredictable environments.
Original Abstract
Visual reinforcement learning aims to empower an agent to learn policies from visual observations, yet it remains vulnerable to dynamic visual perturbations, such as unpredictable shifts in corruption types. To systematically study this, we introduce the Visual Degraded Control Suite (VDCS), a benchmark extending DeepMind Control Suite with Markov-switching degradations to simulate non-stationary real-world perturbations. Experiments on VDCS reveal severe performance degradation in existing methods. We theoretically prove via information-theoretic analysis that this failure stems from reconstruction-based objectives inevitably entangling perturbation artifacts into latent representations. To mitigate this negative impact, we propose Agent-Centric Observations with Mixture-of-Experts (ACO-MoE) to robustify visual RL against perturbations. The proposed framework leverages unique agent-centric restoration experts, achieving restoration from corruptions and task-relevant foreground extraction, thereby decoupling perception from perturbation before being processed by the RL agent. Extensive experiments on VDCS show our ACO-MoE outperforms strong baselines, recovering 95.3% of clean performance under challenging Markov-switching corruptions. Moreover, it achieves SOTA results on DMControl Generalization with random-color and video-background perturbations, demonstrating a high level of robustness.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.