ZK-Value: A Practical Zero-Knowledge System for Verifiable Data Valuation
Zhaoyu Wang, Pingchuan Ma, Zhantong Xue, Yuguang Zhou, Qixin Zhang + 2 more
TLDR
ZK-Value is a practical zero-knowledge system for verifiable data valuation that scales to real-world demands using a co-designed architecture.
Key contributions
- Introduces LSH-Shapley, a locality-based valuation primitive using per-bucket collision counts for efficiency.
- Develops ZK-LSH-Shapley, a tailored ZKP protocol that drastically reduces witness size via histogram encoding.
- Applies structural proof-system optimizations like super-oracle batching and sparsity skipping for speed.
- Achieves valuation quality comparable to baselines while generating proofs 12.6x-68.1x faster than ZK alternatives.
Why it matters
Data marketplaces need verifiable data valuation without compromising privacy, but existing ZK systems are too slow. ZK-Value offers a practical, scalable solution, enabling secure and transparent data valuation for real-world applications. This bridges the critical gap between privacy and verifiability.
Original Abstract
Data valuation is a foundational task in data marketplaces, where a Shapley-value attribution determines how a buyer's payment is distributed among data providers. Typically, the marketplace operator runs this attribution alone, requiring participants and external auditors to trust scores they cannot independently recompute on the underlying private data. While zero-knowledge proofs (ZKPs) can theoretically reconcile this conflict between privacy and verifiability, existing ZK valuation systems fail to scale to real-world marketplace demands due to prohibitive proving times or the requirement to disclose validation cohorts. We present ZK-Value, a practical, end-to-end ZK data-valuation system. Our solution bridges the scalability gap through a fully co-designed architecture: (1) LSH-Shapley, a locality-based valuation primitive that replaces expensive pairwise distance metrics with per-bucket collision counts; (2) ZK-LSH-Shapley, a tailored ZKP protocol that drastically reduces witness size by encoding these counts into bucket-level histograms rather than naive per-pair tensors; and (3) structural proof-system optimizations, specifically super-oracle batching and sparsity skipping. Evaluated across 12 standard datasets, ZK-Value delivers valuation quality on par with state-of-the-art baselines (within 0.033 AUROC of exact KNN-Shapley), while generating proofs in seconds to minutes and outperforming specialized ZK baselines by 12.6x to 68.1x in proving time, with verification in under 4.6 s.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.