ArXiv TLDR

Revisiting Active Sequential Prediction-Powered Mean Estimation

🐦 Tweet
2604.18569

Maria-Eleni Sfyraki, Jun-Kun Wang

stat.MLcs.LG

TLDR

This paper revisits active sequential prediction-powered mean estimation, finding constant query probability often yields better confidence intervals.

Key contributions

  • Revisits active sequential prediction-powered mean estimation, combining ML predictions with queried labels.
  • Observes that a dominant constant query probability often leads to the smallest confidence width.
  • Develops a non-asymptotic analysis and data-dependent confidence interval bound for the estimator.
  • Shows query probability converges to a max constraint under no-regret learning, corroborating findings.

Why it matters

This research challenges the intuition that uncertainty-based querying is always superior in active learning. It provides a strong theoretical and empirical basis for using simpler, constant query probabilities, which could lead to more efficient and robust data collection strategies.

Original Abstract

In this work, we revisit the problem of active sequential prediction-powered mean estimation, where at each round one must decide the query probability of the ground-truth label upon observing the covariates of a sample. Furthermore, if the label is not queried, the prediction from a machine learning model is used instead. Prior work proposed an elegant scheme that determines the query probability by combining an uncertainty-based suggestion with a constant probability that encodes a soft constraint on the query probability. We explored different values of the mixing parameter and observed an intriguing empirical pattern: the smallest confidence width tends to occur when the weight on the constant probability is close to one, thereby reducing the influence of the uncertainty-based component. Motivated by this observation, we develop a non-asymptotic analysis of the estimator and establish a data-dependent bound on its confidence interval. Our analysis further suggests that when a no-regret learning approach is used to determine the query probability and control this bound, the query probability converges to the constraint of the max value of the query probability when it is chosen obliviously to the current covariates. We also conduct simulations that corroborate these theoretical findings.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.