Neuromorphic Reinforcement Learning for Quadruped Locomotion Control on Uneven Terrain
Zhuangyu Han, Abhronil Sengupta
TLDR
This paper introduces a neuromorphic reinforcement learning framework using equilibrium propagation for quadruped locomotion on uneven terrain, enabling on-robot adaptation.
Key contributions
- Proposes an equilibrium-propagation (EP)-based PPO framework for quadruped locomotion on uneven terrain.
- Combines a bio-inspired central pattern generator (CPG) with a residual postural adjustment policy.
- Introduces EP-compatible PPO output-nudging and a two-sided ratio clipping for stable updates.
- Achieves comparable performance to backprop PPO with 4.3x GPU memory efficiency.
Why it matters
This work provides an algorithmic foundation for low-power, on-robot adaptation in quadruped robots. By replacing global backpropagation with local learning, it paves the way for energy-efficient, real-time policy adjustments on hardware, crucial for robust autonomous systems.
Original Abstract
Reinforcement learning (RL) has enabled robust quadruped locomotion over complex terrain, but most learned controllers are trained offline with backpropagation in massively parallel simulation and deployed as fixed policies, limiting adaptation to terrain variation, payload changes, actuator wear, and other real-world conditions under onboard power constraints. Local learning provides a potential path toward energy-aware on-robot adaptation by replacing global backpropagation graphs with updates driven by local neural states, making the learning rule more compatible with neuromorphic and in-memory computing substrates. This work proposes an equilibrium-propagation (EP)-based proximal policy optimization (PPO) framework for uneven-terrain quadruped locomotion. The controller combines a bio-inspired central pattern generator (CPG) policy with a residual postural adjustment policy, while replacing conventional backpropagation-trained policy and value networks with EP-enabled local learning. To train stochastic continuous-control policies with EP, we derive an EP-compatible PPO output-nudging signal and introduce a two-sided ratio clipping mechanism that stabilizes policy updates during relaxation. Experiments on a 12-DoF A1 quadruped show that the proposed controller achieves stable policy convergence in a two-stage uneven terrain locomotion task. Its locomotion performance is comparable to a backpropagation-trained PPO baseline in success rate, velocity tracking, actuator power, and body stability, while improving GPU memory efficiency by 4.3\(\times\) compared with backpropagation through time (BPTT). These results suggest that local equilibrium-based learning can support high-dimensional embodied locomotion and provide an algorithmic foundation for low-power on-robot adaptation and fine-tuning.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.