VOW: Verifiable and Oblivious Watermark Detection for Large Language Models
Xiaokun Luan, Yihao Zhang, Pengcheng Su, Feiran Lei, Meng Sun
TLDR
VOW introduces a privacy-preserving and cryptographically verifiable watermark detection protocol for LLMs, addressing text sensitivity and result integrity.
Key contributions
- Achieves privacy-preserving and verifiable LLM watermark detection.
- Uses secure two-party computation with a Verifiable Oblivious Pseudorandom Function (VOPRF).
- Enables detection without revealing user text and verifies the provider's results.
- Practical for short texts and reassesses watermark robustness against paraphrasing attacks.
Why it matters
Current LLM watermarking forces users to reveal sensitive text and lacks result verification. VOW solves this by enabling private and verifiable detection, making LLM provenance more trustworthy and practical. This is crucial for secure and ethical AI text usage.
Original Abstract
Large Language Model (LLM) watermarking is crucial for establishing the provenance of machine-generated text, but most existing methods rely on a centralized trust model. This model forces users to reveal potentially sensitive text to a provider for detection and offers no way to verify the integrity of the result. While asymmetric schemes have been proposed to address these issues, they are either impractical for short texts or lack formal guarantees linking watermark insertion and detection. We propose VOW, a new protocol that achieves both privacy-preserving and cryptographically verifiable watermark detection with high efficiency. Our approach formulates detection as a secure two-party computation problem, instantiating the watermark's core logic with a Verifiable Oblivious Pseudorandom Function (VOPRF). This allows the user and provider to perform detection without the user's text being revealed, while the provider's result is verifiable. Our comprehensive evaluation shows that VOW is practical for short texts and provides a crucial reassessment of watermark robustness against modern paraphrasing attacks.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.