Learning U-Statistics with Active Inference
Xiaoning Wang, Yuyang Huo, Liuhua Peng, Changliang Zou
TLDR
An active inference framework for U-statistics improves estimation efficiency by selectively querying informative labels under budget constraints.
Key contributions
- Introduces an active inference framework to reduce label costs for U-statistics.
- Leverages augmented inverse probability weighting for efficient label querying.
- Characterizes optimal sampling rules and provides practical strategies.
- Extends the method to U-statistic-based empirical risk minimization.
Why it matters
Label acquisition for U-statistics is often expensive. This work provides a novel active learning approach that significantly boosts estimation efficiency with limited data, making U-statistics more practical for modern applications while maintaining statistical validity.
Original Abstract
$U$-statistics play a central role in statistical inference. In many modern applications, however, acquiring the labels required for $U$-statistics is costly. Motivated by recent advances in active inference, we develop an active inference framework for $U$-statistics that selectively queries informative labels to improve estimation efficiency under a fixed labeling budget, while preserving valid statistical inference. Our approach is built on the augmented inverse probability weighting $U$-statistic, which is designed to incorporate the sampling rule and machine learning predictions. We characterize the optimal sampling rule that minimizes its variance and design practical sampling strategies. We further extend the framework to $U$-statistic-based empirical risk minimization. Experiments on real datasets demonstrate substantial gains in estimation efficiency over baseline methods, while maintaining target coverage.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.