Fisher Decorator: Refining Flow Policy via A Local Transport Map

April 20, 20262604.17919

Xiaoyuan Cheng, Haoyu Wang, Wenxuan Yuan, Ziyan Wang, Zonghao Chen + 2 more

cs.LGcs.RO

TLDR

Fisher Decorator refines flow policies in offline RL by using a local transport map and anisotropic optimization, outperforming prior isotropic methods.

Key contributions

Addresses geometric mismatch in flow-based offline RL due to isotropic L2/W2 regularization.
Introduces Fisher Decorator, a local transport map augmenting initial flow with residual displacement.
Derives anisotropic optimization using the Fisher information matrix for improved policy refinement.
Achieves state-of-the-art performance by correcting optimality gaps from isotropic approximations.

Why it matters

Existing flow-based offline RL methods suffer from geometric mismatches, leading to suboptimal policies. This paper introduces a novel anisotropic optimization framework, Fisher Decorator, that significantly refines policy learning. It offers a more accurate and efficient approach, pushing the boundaries of offline RL performance.

Original Abstract

Recent advances in flow-based offline reinforcement learning (RL) have achieved strong performance by parameterizing policies via flow matching. However, they still face critical trade-offs among expressiveness, optimality, and efficiency. In particular, existing flow policies interpret the $L_2$ regularization as an upper bound of the 2-Wasserstein distance ($W_2$), which can be problematic in offline settings. This issue stems from a fundamental geometric mismatch: the behavioral policy manifold is inherently anisotropic, whereas the $L_2$ (or upper bound of $W_2$) regularization is isotropic and density-insensitive, leading to systematically misaligned optimization directions. To address this, we revisit offline RL from a geometric perspective and show that policy refinement can be formulated as a local transport map: an initial flow policy augmented by a residual displacement. By analyzing the induced density transformation, we derive a local quadratic approximation of the KL-constrained objective governed by the Fisher information matrix, enabling a tractable anisotropic optimization formulation. By leveraging the score function embedded in the flow velocity, we obtain a corresponding quadratic constraint for efficient optimization. Our results reveal that the optimality gap in prior methods arises from their isotropic approximation. In contrast, our framework achieves a controllable approximation error within a provable neighborhood of the optimal solution. Extensive experiments demonstrate state-of-the-art performance across diverse offline RL benchmarks. See project page: https://github.com/ARC0127/Fisher-Decorator.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers