ArXiv TLDR

Calibeating Prediction-Powered Inference

🐦 Tweet
2604.21260

Lars van der Laan, Mark Van Der Laan

stat.MLcs.AIcs.LGecon.EMq-bio.QMstat.ME

TLDR

Calibeating Prediction-Powered Inference introduces a method to post-hoc calibrate prediction scores on labeled data, improving semisupervised mean estimation efficiency.

Key contributions

  • Introduces Calibrated Prediction-Powered Inference (CPPI) for semisupervised mean estimation.
  • Post-hoc calibrates black-box prediction scores on labeled data without retraining, boosting efficiency.
  • Establishes first-order optimality guarantees for isotonic calibration, outperforming simpler rules.
  • Clarifies relationships among existing estimators like PPI, AIPW, and PPI++.

Why it matters

This paper addresses the challenge of miscalibrated prediction models in semisupervised learning. By introducing a simple, no-retraining calibration step, it significantly boosts the efficiency and accuracy of mean estimation. This provides a practical and theoretically sound approach to leverage black-box models more effectively.

Original Abstract

We study semisupervised mean estimation with a small labeled sample, a large unlabeled sample, and a black-box prediction model whose output may be miscalibrated. A standard approach in this setting is augmented inverse-probability weighting (AIPW) [Robins et al., 1994], which protects against prediction-model misspecification but can be inefficient when the prediction score is poorly aligned with the outcome scale. We introduce Calibrated Prediction-Powered Inference, which post-hoc calibrates the prediction score on the labeled sample before using it for semisupervised estimation. This simple step requires no retraining and can improve the original score both as a predictor of the outcome and as a regression adjustment for semisupervised inference. We study both linear and isotonic calibration. For isotonic calibration, we establish first-order optimality guarantees: isotonic post-processing can improve predictive accuracy and estimator efficiency relative to the original score and simpler post-processing rules, while no further post-processing of the fitted isotonic score yields additional first-order gains. For linear calibration, we show first-order equivalence to PPI++. We also clarify the relationship among existing estimators, showing that the original PPI estimator is a special case of AIPW and can be inefficient when the prediction model is accurate, while PPI++ is AIPW with empirical efficiency maximization [Rubin et al., 2008]. In simulations and real-data experiments, our calibrated estimators often outperform PPI and are competitive with, or outperform, AIPW and PPI++. We provide an accompanying Python package, ppi_aipw, at https://larsvanderlaan.github.io/ppi-aipw/.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.