ArXiv TLDR

First-Order Efficiency for Probabilistic Value Estimation via A Statistical Viewpoint

🐦 Tweet
2605.02827

Ziqi Liu, Kiljae Lee, Yuan Zhang, Weijing Tang

cs.AIstat.MEstat.ML

TLDR

This paper introduces EASE, an estimator for probabilistic values that minimizes first-order MSE by optimizing sampling and surrogate functions.

Key contributions

  • Reveals a common first-order error structure in existing probabilistic value estimators.
  • Provides an explicit MSE expression showing how sampling law and surrogate determine efficiency.
  • Introduces EASE, an estimator that minimizes first-order MSE by optimizing sampling and surrogate.
  • EASE consistently outperforms state-of-the-art methods for various probabilistic value estimations.

Why it matters

Probabilistic values are crucial for explainable AI and data valuation. Current Monte Carlo methods are inefficient. This paper offers a principled way to improve estimation efficiency by understanding and optimizing the error structure, leading to more accurate and reliable model attribution.

Original Abstract

Probabilistic values, including Shapley values and semivalues, provide a model-agnostic framework to attribute the behavior of a black-box model to data points or features, with a wide range of applications including explainable artificial intelligence and data valuation. However, their exact computation requires utility evaluations over exponentially many coalitions, making Monte Carlo approximation essential in modern machine learning applications. Existing estimators are often developed through different identification strategies, including weighted averages, self-normalized weighting, regression adjustment, and weighted least squares. Our key observation is that these seemingly distinct constructions share a common first-order error structure, in which the leading term is an augmented inverse-probability weighted influence term determined by the sampling law and a working surrogate function. This first-order representation yields an explicit expression for the leading mean squared error (MSE), which characterizes how the sampling law and the surrogate jointly determine statistical efficiency. Guided by this criterion, we propose an Efficiency-Aware Surrogate-adjusted Estimator (EASE) that directly chooses the sampling law and surrogate to minimize the first-order MSE. We demonstrate that EASE consistently outperforms state-of-the-art estimators for various probabilistic values.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.