ArXiv TLDR

Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values

🐦 Tweet
2605.00762

Shradha Sharma, Swapnil Dhamal, Shweta Jain

cs.LGcs.AIcs.MA

TLDR

Introduces a meritocratic fairness framework for budgeted combinatorial multi-armed bandits (BCMAB-FBF) using K-Shapley values and the K-SVFair-FBF algorithm.

Key contributions

  • Proposes a meritocratic fairness framework for Budgeted Combinatorial Multi-armed Bandits (BCMAB-FBF).
  • Extends Shapley value to K-Shapley value, a unique solution for full-bandit feedback with key properties.
  • Introduces K-SVFair-FBF algorithm to estimate K-Shapley value and learn valuation under full feedback.
  • Achieves an O(T^(3/4)) regret bound on fairness regret, outperforming baselines in experiments.

Why it matters

This paper addresses the challenging problem of meritocratic fairness in full-bandit feedback, where individual arm contributions are hard to discern. It provides a robust framework and algorithm for fair resource allocation in complex systems like federated learning and social influence maximization.

Original Abstract

We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF). Unlike semi-bandit feedback, the contribution of individual arms is not received in full-bandit feedback, making the setting significantly more challenging. To compute arm contributions in BCMAB-FBF, we first extend the Shapley value, a classical solution concept from cooperative game theory, to the $K$-Shapley value, which captures the marginal contribution of an agent restricted to a set of size at most $K$. We show that $K$-Shapley value is a unique solution concept that satisfies Symmetry, Linearity, Null player, and efficiency properties. We next propose K-SVFair-FBF, a fairness-aware bandit algorithm that adaptively estimates $K$-Shapley value with unknown valuation function. Unlike standard bandit literature on full bandit feedback, K-SVFair-FBF not only learns the valuation function under full feedback setting but also mitigates the noise arising from Monte Carlo approximations. Theoretically, we prove that K-SVFair-FBF achieves $O(T^{3/4})$ regret bound on fairness regret. Through experiments on federated learning and social influence maximization datasets, we demonstrate that our approach achieves fairness and performs more effectively than existing baselines.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.