Differentially-Private Text Rewriting reshapes Linguistic Style

April 29, 20262604.26656

cs.CL

TLDR

Differentially-private text rewriting systematically alters linguistic style, making text less interactive and persuasive while preserving content.

Key contributions

DP text rewriting systematically alters linguistic style, not just lexical variation.
It causes a "functional mutation" of the text's communicative signature.
Leads to severe attrition of interactive markers, contextual references, and complex subordination.
Both autoregressive and bidirectional DP methods converge to a non-involved, non-persuasive register.

Why it matters

This paper reveals a critical, overlooked consequence of differentially-private text rewriting: it homogenizes linguistic style. Understanding this "register-blind sanitization" is crucial for developing DP methods that balance privacy with preserving the nuanced communicative intent of human-authored text. This impacts applications where text style and engagement are important.

Original Abstract

Differential Privacy (DP) for text matured from disjointed word-level substitutions to contiguous sentence-level rewriting by leveraging the generative capacity of language models. While this form of text privatization is best suited for balancing formal privacy guarantees with grammatical coherence, its impact on the register identity of text remains largely unexplored. By conducting a multidimensional stylistic profiling of differentially-private rewriting, we demonstrate that the cost of privacy extends far beyond lexical variation. Specifically, we find that rewriting under privacy constraints induces a systematic functional mutation of the text's communicative signature. This shift is characterized by the severe attrition of interactive markers, contextual references, and complex subordination. By comparing autoregressive paraphrasing against bidirectional substitution across a spectrum of privacy budgets, we observe that both architectures force convergence toward a non-involved and non-persuasive register. This register-blind sanitization effectively preserves semantic content but structurally homogenizes the nuanced stylistic markers that define human-authored discourse.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers