ArXiv TLDR

SSG: Logit-Balanced Vocabulary Partitioning for LLM Watermarking

🐦 Tweet
2604.22438

Chenxi Gu, Xiaoning Du, John Grundy

cs.CRcs.AIcs.CL

TLDR

SSG introduces a logit-balanced vocabulary partitioning method to significantly improve LLM watermarking detectability, especially in low-entropy contexts.

Key contributions

  • Identifies that next-token probability distribution (watermark strength) limits KGW watermarking effectiveness.
  • Proposes SSG, a novel vocabulary partitioning method, to create logit-balanced subsets.
  • SSG lifts the lower bound of watermark strength, enhancing watermarking detectability.
  • Demonstrates improved performance for LLM watermarking in low-entropy tasks like code and math.

Why it matters

LLM watermarking is crucial for tracing content authorship, but existing methods struggle with low-entropy outputs. SSG offers a significant advancement by improving detectability in these challenging scenarios. This makes LLM watermarking more robust and broadly applicable.

Original Abstract

Watermarking has emerged as a promising technique for tracing the authorship of content generated by large language models (LLMs). Among existing approaches, the KGW scheme is particularly attractive due to its versatility, efficiency, and effectiveness in natural language generation. However, KGW's effectiveness degrades significantly under low-entropy settings such as code generation and mathematical reasoning. A crucial step in the KGW method is random vocabulary partitioning, which enables adjustments to token selection based on specific preferences. Our study revealed that the next-token probability distribution plays an critical role in determining how much, or even whether, we can modify token selection and, consequently, the effectiveness of watermarking. We refer to this characteristic, associated with the probability distribution of each token prediction, as \emph{watermark strength.} In cases of random vocabulary partitioning, the lower bound of watermark strength is dictated by the next-token probability distribution. However, we found that, by redesigning the vocabulary partitioning algorithm, we can potentially raise this lower bound. In this paper, we propose SSG (\textbf{S}ort-then-\textbf{S}plit by \textbf{G}roups), a method that partitions the vocabulary into two logit-balanced subsets. This design lifts the lower bound of watermark strength for each token prediction, thereby improving watermark detectability. Experiments on code generation and mathematical reasoning datasets demonstrate the effectiveness of SSG.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.