ArXiv TLDR

Buying Data of Unknown Quality: Fisher Information Procurement Auctions

🐦 Tweet
2604.08821

Yuchen Hu, Martin J. Wainwright, Stephen Bates

cs.GTecon.THstat.ME

TLDR

This paper introduces mechanisms for buyers to procure data for parameter estimation from providers with unknown quality, ensuring truthful reporting.

Key contributions

  • Defines a cost-per-information score for data providers when quality is known ex ante.
  • Proposes a second-score procurement mechanism for known quality, ensuring truthful cost reporting.
  • Introduces a mechanism for private quality, augmenting the second-score rule with an ex post statistical test.
  • Proves an equilibrium where sellers report costs truthfully and quality deviations vanish with sample size.

Why it matters

This work is crucial for designing efficient and trustworthy data markets, especially when data quality is uncertain. It offers practical mechanisms for buyers to procure data while incentivizing sellers to report truthfully, improving data acquisition strategies.

Original Abstract

We study statistical parameter estimation in the setting of data markets. A buyer seeks to estimate a parameter based on samples that can be purchased from competing providers that differ in their data quality and provision costs. When quality is known ex ante, we define a cost-per-information score that summarizes each provider's provision cost per unit of information about the buyer's estimation objective. We describe second-score procurement mechanism that ranks providers by this score, and endogenously chooses both a provider and a sample size while making truthful cost reports optimal. We then turn to the more realistic setting where data quality is private, and can only be indirectly observed via the delivered data. In this setting, we propose a simple mechanism that augments the second-score rule with a lenient ex post statistical test of the reported quality. We prove that under mild conditions, there exists an equilibrium in which sellers report costs truthfully and report quality up to deviations that vanish as the procured sample size grows. Our analysis highlights how the choice of verification test and the buyer's accuracy-cost tradeoff jointly shape participation and misreporting incentives in data markets.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.