Physics-Informed Reinforcement Learning of Spatial Density Velocity Potentials for Map-Free Racing

April 10, 20262604.09499

Shathushan Sivashangaran, Apoorva Khairnar, Sepideh Gohari, Vihaan Dutta, Azim Eskandarian

cs.RO

TLDR

A DRL method uses physics-informed rewards and spectral depth data for map-free racing, outperforming human drivers with less computation.

Key contributions

Presents a DRL method using physics-informed rewards and spectral depth measurements for map-free racing.
Infers time-optimal and overtaking controls with an ANN, using <1% computation of prior methods.
Eliminates sim-to-reality transfer issues via an exploit-aware reward and implicit collision handling.
Outperforms human demonstrations by 12% on OOD tracks on hardware, maximizing tire friction.

Why it matters

Autonomous racing without maps is challenging due to kinodynamic planning and OOD generalization. This paper offers a robust DRL solution that addresses sim-to-reality gaps and computational overhead. Its approach significantly advances map-free autonomous control, showing superior performance on real hardware.

Original Abstract

Autonomous racing without prebuilt maps is a grand challenge for embedded robotics that requires kinodynamic planning from instantaneous sensor data at the acceleration and tire friction limits. Out-Of-Distribution (OOD) generalization to various racetrack configurations utilizes Machine Learning (ML) to encode the mathematical relation between sensor data and vehicle actuation for end-to-end control, with implicit localization. These comprise Behavioral Cloning (BC) that is capped to human reaction times and Deep Reinforcement Learning (DRL) which requires large-scale collisions for comprehensive training that can be infeasible without simulation but is arduous to transfer to reality, thus exhibiting greater performance than BC in simulation, but actuation instability on hardware. This paper presents a DRL method that parameterizes nonlinear vehicle dynamics from the spectral distribution of depth measurements with a non-geometric, physics-informed reward, to infer vehicle time-optimal and overtaking racing controls with an Artificial Neural Network (ANN) that utilizes less than 1% of the computation of BC and model-based DRL. Slaloming from simulation to reality transfer and variance-induced conservatism are eliminated with the combination of a physics engine exploit-aware reward and the replacement of an explicit collision penalty with an implicit truncation of the value horizon. The policy outperforms human demonstrations by 12% in OOD tracks on proportionally scaled hardware, by maximizing the friction circle with tire dynamics that resemble an empirical Pacejka tire model. System identification illuminates a functional bifurcation where the first layer compresses spatial observations to extract digitized track features with higher resolution in corner apexes, and the second encodes nonlinear dynamics.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers