A Comparison of Reinforcement Learning and Optimal Control Methods for Path Planning
Qiang Le, Yaguang Yang, Isaac E. Weintraub
TLDR
DDPG offers a faster, real-time path planning solution for autonomous vehicles in threat environments compared to traditional optimal control.
Key contributions
- Proposes a DDPG-based method for real-time autonomous vehicle path planning in threat zones.
- Trains an agent to map states to actions, ensuring safe navigation around 'no-go' zones to a goal.
- Identifies the "feasible set" of starting points from which safe mission completion is guaranteed.
- Demonstrates DDPG's significant speed advantage over optimal control for real-time path planning.
Why it matters
This paper addresses the critical need for real-time path planning in autonomous systems operating in hazardous environments. By demonstrating DDPG's speed advantage over traditional optimal control, it offers a practical approach for mission planning. The concept of a "feasible set" also provides valuable pre-mission intelligence.
Original Abstract
Path-planning for autonomous vehicles in threat-laden environments is a fundamental challenge. While traditional optimal control methods can find ideal paths, the computational time is often too slow for real-time decision-making. To solve this challenge, we propose a method based on Deep Deterministic Policy Gradient (DDPG) and model the threat as a simple, circular `no-go' zone. A mission failure is claimed if the vehicle enters this `no-go' zone at any time or does not reach a neighborhood of the destination. The DDPG agent is trained to learn a direct mapping from its current state (position and velocity) to a series of feasible actions that guide the agent to safely reach its goal. A reward function and two neural networks, critic and actor, are used to describe the environment and guide the control efforts. The DDPG trains the agent to find the largest possible set of starting points (``feasible set'') wherein a safe path to the goal is guaranteed. This provides critical information for mission planning, showing beforehand whether a task is achievable from a given starting point, assisting pre-mission planning activities. The approach is validated in simulation. A comparison between the DDPG method and a traditional optimal control (pseudo-spectral) method is carried out. The results show that the learning-based agent may produce effective paths while being significantly faster, making it a better fit for real-time applications. However, there are areas (``infeasible set'') where the DDPG agent cannot find paths to the destination, and the paths in the feasible set may not be optimal. These preliminary results guide our future research: (1) improve the reward function to enlarge the DDPG feasible set, (2) examine the feasible set obtained by the pseudo-spectral method, and (3) investigate the arc-search IPM method for the path planning problem.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.