EvoNav: Evolutionary Reward Function Design for Robot Navigation with Large Language Models

May 12, 20262605.11859

Zhikai Zhao, Chuanbo Hua, Federico Berto, Zihan Ma, Kanghoon Lee + 2 more

cs.ROcs.AI

TLDR

EvoNav uses LLMs and an efficient three-stage evolutionary framework to automatically design superior reward functions for robot navigation.

Key contributions

Proposes EvoNav, an evolutionary framework for automated robot navigation reward function design using LLMs.
Introduces a three-stage warm-up-boost evaluation procedure for efficient candidate reward function assessment.
Progresses from low-cost analytical proxies to lightweight rollouts and full policy training for computational efficiency.
Achieves more effective navigation policies than hand-crafted and state-of-the-art reward design methods.

Why it matters

Hand-crafting reward functions for robot navigation is challenging and often leads to suboptimal performance. EvoNav automates this crucial step using LLMs, significantly improving policy quality. This approach makes advanced robot navigation more accessible and efficient to develop.

Original Abstract

Robot navigation is a crucial task with applications to social robots in dynamic human environments. While Reinforcement Learning (RL) has shown great promise for this problem, the policy quality is highly sensitive to the specification of reward functions. Hand-crafted rewards require substantial domain expertise and embed inductive biases that are difficult to audit or adapt, limiting their effectiveness and leading to suboptimal performance. In this paper, we propose EvoNav, an evolutionary framework that automates the design of robot navigation reward functions via large language models (LLMs). To overcome prohibitively costly policy training, EvoNav evaluates each candidate proposal from the LLM via a progressive three-stage warm-up-boost procedure. EvoNav advances from analytical proxies with low-cost surrogates, such as small datasets and analytic rules, to lightweight rollouts and, finally, to full policy training, enabling computationally efficient exploration under effective feedback. Experiment results show that EvoNav produces more effective navigation policies than manually designed RL rewards and state-of-the-art reward design methods.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers