Graph Representation Learning Augmented Model Manipulation on Federated Fine-Tuning of LLMs

May 8, 20262605.07961

Hanlin Cai, Kai Li, Houtianfu Wang, Haofan Dong, Yichen Li + 2 more

cs.LGcs.CRcs.NI

TLDR

AugMP is a novel graph representation learning strategy to manipulate federated fine-tuning of LLMs, reducing accuracy and evading defenses.

Key contributions

Proposes AugMP, a novel strategy to manipulate federated fine-tuning (FFT) of LLMs.
Uses graph representation learning to guide the generation of stealthy malicious updates.
Develops an iterative algorithm based on augmented Lagrangian dual for optimization.
Achieves strong manipulation, reducing LLM accuracy by up to 26%, while evading defenses.

Why it matters

Federated fine-tuning (FFT) is vital for privacy-preserving LLM adaptation, but this paper reveals its vulnerability to sophisticated model manipulation. AugMP demonstrates a highly effective and stealthy attack, highlighting critical security gaps in current FFT-LLM systems and the urgent need for robust defenses.

Original Abstract

Federated fine-tuning (FFT) has emerged as a privacy-preserving paradigm for collaboratively adapting large language models (LLMs). Built upon federated learning, FFT enables distributed agents to jointly refine a shared pretrained LLM by aggregating local LLM updates without sharing local raw data. However, FFT-based LLMs remain vulnerable to model manipulation threats, in which adversarial participants upload manipulated LLM updates that corrupt the aggregation process and degrade the performance of the global LLM. In this paper, we propose an Augmented Model maniPulation (AugMP) strategy against FFT-based LLMs. Specifically, we design a novel graph representation learning framework that captures feature correlations among benign LLM updates to guide the generation of malicious updates. To enhance manipulation effectiveness and stealthiness, we develop an iterative manipulation algorithm based on an augmented Lagrangian dual formulation. Through this formulation, malicious updates are optimized to embed adversarial objectives while preserving benign-like parameter characteristics. Experimental results across multiple LLM backbones demonstrate that the AugMP strategy achieves the strongest manipulation performance among all competing baselines, reducing the global LLM accuracy by up to 26% and degrading the average accuracy of local LLM agents by up to 22%. Meanwhile, AugMP maintains high statistical and geometric consistency with benign updates, enabling it to evade conventional distance- and similarity-based defense methods.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers