Fuzzy Logic Theory-based Adaptive Reward Shaping for Robust Reinforcement Learning (FARS)

April 17, 20262604.15772

Hürkan Şahin, Van Huyen Dang, Erdi Sayar, Alper Yegenoglu, Erdal Kayacan

cs.RO

TLDR

FARS introduces fuzzy logic-based adaptive reward shaping to improve RL exploration, stability, and convergence in complex, real-world tasks.

Key contributions

Proposes FARS, a fuzzy logic-based adaptive reward shaping method for robust reinforcement learning.
Integrates human intuition and expert knowledge into RL reward design via interpretable fuzzy rules.
Adapts reward contributions based on agent state, enabling smoother control in challenging navigation tasks.
Achieves faster convergence, reduced variability, and up to 5% higher success rates in drone racing.

Why it matters

This paper addresses critical RL challenges like sparse rewards and slow exploration in complex environments. By leveraging fuzzy logic, it offers a more stable and efficient learning approach. This could significantly advance real-world RL applications, especially in robotics and autonomous systems.

Original Abstract

Reinforcement learning (RL) often struggles in real-world tasks with high-dimensional state spaces and long horizons, where sparse or fixed rewards severely slow down exploration and cause agents to get trapped in local optima. This paper presents a fuzzy logic based reward shaping method that integrates human intuition into RL reward design. By encoding expert knowledge into adaptive and interpreable terms, fuzzy rules promote stable learning and reduce sensitivity to hyperparameters. The proposed method leverages these properties to adapt reward contributions based on the agent state, enabling smoother transitions between fast motion and precise control in challenging navigation tasks. Extensive simulation results on autonomous drone racing benchmarks show stable learning behavior and consistent task performance across scenarios of increasing difficulty. The proposed method achieves faster convergence and reduced performance variability across training seeds in more challenging environments, with success rates improving by up to approximately 5 percent compared to non fuzzy reward formulations.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers