ArXiv TLDR

FedAttr: Towards Privacy-preserving Client-Level Attribution in Federated LLM Fine-tuning

🐦 Tweet
2605.06596

Su Zhang, Junfeng Guo, Heng Huang

cs.CRcs.LG

TLDR

FedAttr enables privacy-preserving client-level attribution in federated LLM fine-tuning to detect data ownership violations without compromising privacy.

Key contributions

  • Addresses the challenge of client-level watermark detection in Federated Learning while preserving privacy.
  • Proposes FedAttr, using paired-subset-difference and differential scoring to estimate and score client updates.
  • Achieves 100% True Positive Rate and 0% False Positive Rate, outperforming baselines significantly.
  • Maintains FL privacy guarantees with minimal overhead (6.3%) and bounded information leakage.

Why it matters

As Federated Learning becomes prevalent for LLM fine-tuning, ensuring data ownership and detecting misuse is critical. Existing watermark methods struggle with FL's privacy mechanisms. FedAttr provides a robust solution, enabling client-level attribution without sacrificing privacy or performance, which is vital for trust and accountability in collaborative AI.

Original Abstract

Watermark radioactivity testing type of methods can detect whether a model was trained on watermarked documents, and have become key tools for protecting data ownership in the fine-tuning of large language models (LLMs). Existing works have proved their effectiveness in centralized LLM fine-tuning. However, this type of method faces several challenges and remains underexplored in federated learning (FL), a widely-applied paradigm for fine-tuning LLMs collaboratively on private data across different users. FL mainly ensures privacy through secure aggregation (SA), which allows the server to aggregate updates while keeping clients' updates private. This mechanism preserves privacy but makes it difficult to identify which client trained on watermarked documents. In this work, we propose FedAttr, a new client-level attribution protocol for FL. FedAttr identifies which clients trained on watermarked data via a paired-subset-difference mechanism, while preserving the privacy guarantees of SA and FL performance. FedAttr proceeds in three steps: (i) estimate each client's update by differencing two SA queries, (ii) score the estimate with the watermark detector via differential scoring, and (iii) combine scores across rounds via Stouffer method. We theoretically show that FedAttr produces an unbiased estimator of each client's update with bounded mutual information leakage (i.e., $O(d^*/N)$ per-round update). Moreover, FedAttr empirically achieves 100% TPR and 0% FPR, outperforming all baselines by at least 44.4% in TPR or 19.1% in FPR, with only 6.3% overhead relative to FL training time. Ablation studies confirm that FedAttr is robust to protocol parameters and configurations.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.