Efficient Hierarchical Implicit Flow Q-learning for Offline Goal-conditioned Reinforcement Learning

April 10, 20262604.08960

Zhiqiang Dong, Teng Pang, Rongjian Xu, Guoqiang Wu

cs.LG

TLDR

HIFQL introduces a goal-conditioned mean flow policy and LeJEPA loss to improve long-horizon control in offline goal-conditioned reinforcement learning.

Key contributions

Proposes a goal-conditioned mean flow policy for hierarchical offline GCRL.
Uses an average velocity field to capture complex target distributions for policies.
Introduces LeJEPA loss to learn more discriminative goal representations.
Achieves strong performance on state-based and pixel-based OGBench tasks.

Why it matters

Long-horizon control in offline GCRL is challenging due to limited policy expressiveness and ineffective subgoal generation. This work addresses these issues by enabling more complex policy distributions and discriminative goal representations, significantly advancing offline reinforcement learning capabilities.

Original Abstract

Offline goal-conditioned reinforcement learning (GCRL) is a practical reinforcement learning paradigm that aims to learn goal-conditioned policies from reward-free offline data. Despite recent advances in hierarchical architectures such as HIQL, long-horizon control in offline GCRL remains challenging due to the limited expressiveness of Gaussian policies and the inability of high-level policies to generate effective subgoals. To address these limitations, we propose the goal-conditioned mean flow policy, which introduces an average velocity field into hierarchical policy modeling for offline GCRL. Specifically, the mean flow policy captures complex target distributions for both high-level and low-level policies through a learned average velocity field, enabling efficient action generation via one-step sampling. Furthermore, considering the insufficiency of goal representation, we introduce a LeJEPA loss that repels goal representation embeddings during training, thereby encouraging more discriminative representations and improving generalization. Experimental results show that our method achieves strong performance across both state-based and pixel-based tasks in the OGBench benchmark.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers