A Direct Approach for Handling Contextual Bandits with Latent State Dynamics

April 9, 20262604.08149

cs.LGstat.ML

TLDR

This paper introduces a direct approach for contextual bandits with latent state dynamics, achieving stronger high-probability regret bounds.

Key contributions

Proposes a more natural model for contextual bandits with latent HMC states.
Achieves stronger, high-probability regret bounds for the proposed model.
Develops a fully adaptive strategy that estimates HMM parameters online.
Regret bounds are independent of reward functions, simplifying analysis.

Why it matters

This work addresses limitations in prior approaches to contextual bandits with latent state dynamics by proposing a more realistic model and achieving stronger theoretical guarantees. Its fully adaptive strategy and simplified regret bounds offer significant advancements for online learning in complex, hidden-state environments.

Original Abstract

We revisit the finite-armed linear bandit model by Nelson et al. (2022), where contexts and rewards are governed by a finite hidden Markov chain. Nelson et al. (2022) approach this model by a reduction to linear contextual bandits; but to do so, they actually introduce a simplification in which rewards are linear functions of the posterior probabilities over the hidden states given the observed contexts, rather than functions of the hidden states themselves. Their analysis (but not their algorithm) also does not take into account the estimation of the HMM parameters, and only tackles expected, not high-probability, bounds, which suffer in addition from unnecessary complex dependencies on the model (like reward gaps). We instead study the more natural model incorporating direct dependencies in the hidden states (on top of dependencies on the observed contexts, as is natural for contextual bandits) and also obtain stronger, high-probability, regret bounds for a fully adaptive strategy that estimates HMM parameters online. These bounds do not depend on the reward functions and only depend on the model through the estimation of the HMM parameters.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers