Differentially-Private Text Rewriting reshapes Linguistic Style
TLDR
Differentially-private text rewriting systematically alters linguistic style, making text less interactive and persuasive while preserving content.
Key contributions
- DP text rewriting systematically alters linguistic style, not just lexical variation.
- It causes a "functional mutation" of the text's communicative signature.
- Leads to severe attrition of interactive markers, contextual references, and complex subordination.
- Both autoregressive and bidirectional DP methods converge to a non-involved, non-persuasive register.
Why it matters
This paper reveals a critical, overlooked consequence of differentially-private text rewriting: it homogenizes linguistic style. Understanding this "register-blind sanitization" is crucial for developing DP methods that balance privacy with preserving the nuanced communicative intent of human-authored text. This impacts applications where text style and engagement are important.
Original Abstract
Differential Privacy (DP) for text matured from disjointed word-level substitutions to contiguous sentence-level rewriting by leveraging the generative capacity of language models. While this form of text privatization is best suited for balancing formal privacy guarantees with grammatical coherence, its impact on the register identity of text remains largely unexplored. By conducting a multidimensional stylistic profiling of differentially-private rewriting, we demonstrate that the cost of privacy extends far beyond lexical variation. Specifically, we find that rewriting under privacy constraints induces a systematic functional mutation of the text's communicative signature. This shift is characterized by the severe attrition of interactive markers, contextual references, and complex subordination. By comparing autoregressive paraphrasing against bidirectional substitution across a spectrum of privacy budgets, we observe that both architectures force convergence toward a non-involved and non-persuasive register. This register-blind sanitization effectively preserves semantic content but structurally homogenizes the nuanced stylistic markers that define human-authored discourse.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.