ArXiv TLDR

GSDrive: Reinforcing Driving Policies by Multi-mode Trajectory Probing with 3D Gaussian Splatting Environment

🐦 Tweet
2604.28111

Ziang Guo, Min Chen, Xuefeng Zhang, Yixiao Zhou, Zufeng Zhang + 1 more

cs.RO

TLDR

GSDrive improves E2E driving policies using 3D Gaussian Splatting for dense, physics-based rewards via multi-mode trajectory probing.

Key contributions

  • Leverages 3D Gaussian Splatting (3DGS) for differentiable, physics-based reward shaping.
  • Integrates a flow matching-based trajectory predictor for multi-mode probing.
  • Provides immediate, dense feedback, overcoming sparse, event-based RL rewards.
  • Establishes bidirectional knowledge exchange between imitation and reinforcement learning.

Why it matters

Conventional RL for autonomous driving often suffers from sparse, delayed rewards, leading to suboptimal policies. GSDrive addresses this by providing dense, physics-based feedback, significantly improving E2E driving policy learning. This approach enhances robustness and performance in complex scenarios.

Original Abstract

End-to-end (E2E) autonomous driving presents a promising approach for translating perceptual inputs directly into driving actions. However, prohibitive annotation costs and temporal data quality degradation hinder long-term real-world deployment. While combining imitation learning (IL) and reinforcement learning (RL) is a common strategy for policy improvement, conventional RL training relies on delayed, event-based rewards-policies learn only from catastrophic outcomes such as collisions, leading to premature convergence to suboptimal behaviors. To address these limitations, we introduce GSDrive, a framework that exploits 3D Gaussian Splatting (3DGS) for differentiable, physics-based reward shaping in E2E driving policy improvement. Our method incorporates a flow matching-based trajectory predictor within the 3DGS simulator, enabling multi-mode trajectory probing where candidate trajectories are rolled out to assess prospective rewards. This establishes a bidirectional knowledge exchange between IL and RL by grounding reward functions in physically simulated interaction signals, offering immediate dense feedback instead of sparse catastrophic events. Evaluated on the reconstructed nuScenes dataset, our method surpasses existing simulation-based RL driving approaches in closed-loop experiments. Code is available at https://github.com/ZionGo6/GSDrive.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.