Offline-Online Reinforcement Learning for Linear Mixture MDPs

April 13, 20262604.11994

cs.LGmath.OCstat.ML

TLDR

An adaptive offline-online RL method for linear mixture MDPs uses offline data only when beneficial, improving performance or matching online-only.

Key contributions

Proposes an adaptive algorithm for offline-online RL in linear mixture MDPs with environment shift.
Algorithm intelligently leverages informative offline data to improve over purely online learning.
Safely ignores uninformative offline data, matching online-only performance without degradation.
Provides regret upper and lower bounds characterizing when offline data is beneficial.

Why it matters

Offline data is abundant but often mismatched, posing a challenge for RL. This work offers a robust solution for leveraging such data. It provides a principled way to integrate offline knowledge, ensuring performance gains when data is useful and safety when it's not.

Original Abstract

We study offline-online reinforcement learning in linear mixture Markov decision processes (MDPs) under environment shift. In the offline phase, data are collected by an unknown behavior policy and may come from a mismatched environment, while in the online phase the learner interacts with the target environment. We propose an algorithm that adaptively leverages offline data. When the offline data are informative, either due to sufficient coverage or small environment shift, the algorithm provably improves over purely online learning. When the offline data are uninformative, it safely ignores them and matches the online-only performance. We establish regret upper bounds that explicitly characterize when offline data are beneficial, together with nearly matching lower bounds. Numerical experiments further corroborate our theoretical findings.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers