Concave Statistical Utility Maximization Bandits via Influence-Function Gradients
Matías Carrasco, Alejandro Cholaquidis
TLDR
This paper introduces a bandit algorithm using influence-function gradients for optimizing concave statistical utilities beyond just expected rewards.
Key contributions
- Optimizes concave statistical utilities of long-run reward distributions in multi-armed bandits.
- Derives stochastic gradient estimators using influence-function calculus from bandit feedback.
- Proposes an entropic mirror-ascent algorithm with multiplicative-weights updates.
- Establishes regret bounds separating optimization error from influence function estimation bias.
Why it matters
Traditional bandits focus on expected rewards; this work extends them to optimize general statistical functionals like variance. By using influence-function gradients, it provides a novel and robust approach for complex utility maximization. This opens new avenues for risk-aware decision-making in sequential learning.
Original Abstract
We study stochastic multi-armed bandits in which the objective is a statistical functional of the long-run reward distribution, rather than expected reward alone. Under mild continuity assumptions, we show that the infinite-horizon problem reduces to optimizing over stationary mixed policies: each weight vector \(w\) on the simplex induces a mixture law \(P^w\), and performance is measured by the concave utility \(U(w)=\mathfrak U(P^w)\). For differentiable statistical utilities, we use influence-function calculus to derive stochastic gradient estimators from bandit feedback. This leads to an entropic mirror-ascent algorithm on a truncated simplex, implemented through multiplicative-weights updates and plug-in estimates of the influence function. We establish regret bounds that separate the mirror-ascent optimization error from the bias caused by estimating the influence function. The framework is developed for general concave distributional utilities and illustrated through variance and Wasserstein objectives, with numerical experiments comparing exact and plug-in influence-function implementations.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.