PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization
Murat Bilgehan Ertan, Xiaochen Zhu, Phuong Ha Nguyen, Marten van Dijk, Srinivas Devadas
TLDR
PACZero introduces a novel PAC-private zeroth-order method for fine-tuning LLMs, achieving strong privacy ($I=0$) with usable utility via sign quantization.
Key contributions
- Introduces PACZero, a PAC-private zeroth-order mechanism for fine-tuning large language models.
- Achieves an $I=0$ privacy regime, where membership inference success rate matches the prior.
- Uses sign quantization of aggregated gradients to create 'unanimity' steps, costing zero mutual information.
- PACZero-ZPL reaches 88.99% SST-2 accuracy at $I=0$, competitive with non-private baselines.
Why it matters
This paper addresses the critical challenge of privately fine-tuning large language models without significant utility loss. By achieving an unprecedented $I=0$ privacy level, PACZero offers a practical solution for deploying LLMs in sensitive applications. It outperforms prior methods in high-privacy regimes.
Original Abstract
We introduce PACZero, a family of PAC-private zeroth-order mechanisms for fine-tuning large language models that delivers usable utility at $I(S^*; Y_{1:T})=0$. This privacy regime bounds the membership-inference attack (MIA) posterior success rate at the prior, an MIA-resistance level the DP framework matches only at $\varepsilon=0$ and infinite noise. All DP-ZO comparisons below are matched at the MIA posterior level. The key insight is that PAC Privacy charges mutual information only when the release depends on which candidate subset is the secret. Sign-quantizing subset-aggregated zeroth-order gradients creates frequent unanimity, steps at which every candidate subset agrees on the update direction; at these steps the released sign costs zero conditional mutual information. We propose two variants that span the privacy-utility trade-off: PACZero-MI (budgeted MI via exact calibration on the binary release) and PACZero-ZPL ($I=0$ via a uniform coin flip on disagreement steps). We evaluate on SST-2 and SQuAD with OPT-1.3B and OPT-6.7B in both LoRA and full-parameter tracks. On SST-2 OPT-1.3B full fine-tuning at $I=0$, PACZero-ZPL reaches ${88.99\pm0.91}$, within $2.1$pp of the non-private MeZO baseline ($91.1$ FT). No prior method produces usable utility in the high-privacy regime $\varepsilon<1$, and PACZero-ZPL obtains competitive SST-2 accuracy and nontrivial SQuAD F1 across OPT-1.3B and OPT-6.7B at $I=0$.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.