PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization

May 7, 20262605.06505

Murat Bilgehan Ertan, Xiaochen Zhu, Phuong Ha Nguyen, Marten van Dijk, Srinivas Devadas

cs.LGcs.AIcs.CR

TLDR

PACZero introduces a novel PAC-private zeroth-order method for fine-tuning LLMs, achieving strong privacy ($I=0$) with usable utility via sign quantization.

Key contributions

Introduces PACZero, a PAC-private zeroth-order mechanism for fine-tuning large language models.
Achieves an $I=0$ privacy regime, where membership inference success rate matches the prior.
Uses sign quantization of aggregated gradients to create 'unanimity' steps, costing zero mutual information.
PACZero-ZPL reaches 88.99% SST-2 accuracy at $I=0$, competitive with non-private baselines.

Why it matters

This paper addresses the critical challenge of privately fine-tuning large language models without significant utility loss. By achieving an unprecedented $I=0$ privacy level, PACZero offers a practical solution for deploying LLMs in sensitive applications. It outperforms prior methods in high-privacy regimes.

Original Abstract

We introduce PACZero, a family of PAC-private zeroth-order mechanisms for fine-tuning large language models that delivers usable utility at $I(S^*; Y_{1:T})=0$. This privacy regime bounds the membership-inference attack (MIA) posterior success rate at the prior, an MIA-resistance level the DP framework matches only at $\varepsilon=0$ and infinite noise. All DP-ZO comparisons below are matched at the MIA posterior level. The key insight is that PAC Privacy charges mutual information only when the release depends on which candidate subset is the secret. Sign-quantizing subset-aggregated zeroth-order gradients creates frequent unanimity, steps at which every candidate subset agrees on the update direction; at these steps the released sign costs zero conditional mutual information. We propose two variants that span the privacy-utility trade-off: PACZero-MI (budgeted MI via exact calibration on the binary release) and PACZero-ZPL ($I=0$ via a uniform coin flip on disagreement steps). We evaluate on SST-2 and SQuAD with OPT-1.3B and OPT-6.7B in both LoRA and full-parameter tracks. On SST-2 OPT-1.3B full fine-tuning at $I=0$, PACZero-ZPL reaches ${88.99\pm0.91}$, within $2.1$pp of the non-private MeZO baseline ($91.1$ FT). No prior method produces usable utility in the high-privacy regime $\varepsilon<1$, and PACZero-ZPL obtains competitive SST-2 accuracy and nontrivial SQuAD F1 across OPT-1.3B and OPT-6.7B at $I=0$.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers