ArXiv TLDR

PrivFedTalk: Privacy-Aware Federated Diffusion with Identity-Stable Adapters for Personalized Talking-Head Generation

🐦 Tweet
2604.08037

Soumya Mazumdar, Vineet Kumar Rakesh, Tapas Samanta

cs.CRcs.AIcs.CVcs.LG

TLDR

PrivFedTalk enables privacy-preserving personalized talking-head generation via federated learning with identity-stable adapters.

Key contributions

  • Introduces PrivFedTalk, a federated framework for personalized talking-head generation using local private data.
  • Employs lightweight LoRA identity adapters trained on-device, avoiding raw data sharing and reducing communication.
  • Proposes Identity-Stable Federated Aggregation (ISFA) to handle heterogeneous client distributions.
  • Applies Temporal-Denoising Consistency (TDC) and differential privacy for stability and security.

Why it matters

This paper addresses critical privacy concerns in personalized talking-head generation by proposing a federated learning approach. It allows users to train models on sensitive identity-specific data locally without sharing raw information. This is a significant step towards privacy-preserving AI applications in highly sensitive domains.

Original Abstract

Talking-head generation has advanced rapidly with diffusion-based generative models, but training usually depends on centralized face-video and speech datasets, raising major privacy concerns. The problem is more acute for personalized talking-head generation, where identity-specific data are highly sensitive and often cannot be pooled across users or devices. PrivFedTalk is presented as a privacy-aware federated framework for personalized talking-head generation that combines conditional latent diffusion with parameter-efficient identity adaptation. A shared diffusion backbone is trained across clients, while each client learns lightweight LoRA identity adapters from local private audio-visual data, avoiding raw data sharing and reducing communication cost. To address heterogeneous client distributions, Identity-Stable Federated Aggregation (ISFA) weights client updates using privacy-safe scalar reliability signals computed from on-device identity consistency and temporal stability estimates. Temporal-Denoising Consistency (TDC) regularization is introduced to reduce inter-frame drift, flicker, and identity drift during federated denoising. To limit update-side privacy risk, secure aggregation and client-level differential privacy are applied to adapter updates. The implementation supports both low-memory GPU execution and multi-GPU client-parallel training on heterogeneous shared hardware. Comparative experiments on the present setup across multiple training and aggregation conditions with PrivFedTalk, FedAvg, and FedProx show stable federated optimization and successful end-to-end training and evaluation under constrained resources. The results support the feasibility of privacy-aware personalized talking-head training in federated environments, while suggesting that stronger component-wise, privacy-utility, and qualitative claims need further standardized evaluation.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.