Meritocratic Fairness in Budgeted Combinatorial Multi-armed Bandits via Shapley Values

May 1, 20262605.00762

Shradha Sharma, Swapnil Dhamal, Shweta Jain

cs.LGcs.AIcs.MA

TLDR

Introduces a meritocratic fairness framework for budgeted combinatorial multi-armed bandits (BCMAB-FBF) using K-Shapley values and the K-SVFair-FBF algorithm.

Key contributions

Proposes a meritocratic fairness framework for Budgeted Combinatorial Multi-armed Bandits (BCMAB-FBF).
Extends Shapley value to K-Shapley value, a unique solution for full-bandit feedback with key properties.
Introduces K-SVFair-FBF algorithm to estimate K-Shapley value and learn valuation under full feedback.
Achieves an O(T^(3/4)) regret bound on fairness regret, outperforming baselines in experiments.

Why it matters

This paper addresses the challenging problem of meritocratic fairness in full-bandit feedback, where individual arm contributions are hard to discern. It provides a robust framework and algorithm for fair resource allocation in complex systems like federated learning and social influence maximization.

Original Abstract

We propose a new framework for meritocratic fairness in budgeted combinatorial multi-armed bandits with full-bandit feedback (BCMAB-FBF). Unlike semi-bandit feedback, the contribution of individual arms is not received in full-bandit feedback, making the setting significantly more challenging. To compute arm contributions in BCMAB-FBF, we first extend the Shapley value, a classical solution concept from cooperative game theory, to the $K$-Shapley value, which captures the marginal contribution of an agent restricted to a set of size at most $K$. We show that $K$-Shapley value is a unique solution concept that satisfies Symmetry, Linearity, Null player, and efficiency properties. We next propose K-SVFair-FBF, a fairness-aware bandit algorithm that adaptively estimates $K$-Shapley value with unknown valuation function. Unlike standard bandit literature on full bandit feedback, K-SVFair-FBF not only learns the valuation function under full feedback setting but also mitigates the noise arising from Monte Carlo approximations. Theoretically, we prove that K-SVFair-FBF achieves $O(T^{3/4})$ regret bound on fairness regret. Through experiments on federated learning and social influence maximization datasets, we demonstrate that our approach achieves fairness and performs more effectively than existing baselines.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers