Privacy-Preserving LLMs Routing

April 17, 20262604.15728

Xidong Wu, Yukuan Zhang, Yuqiong Ji, Reza Shirkavand, Qian Lou + 1 more

cs.CRcs.AI

TLDR

PPRoute is a privacy-preserving framework for LLM routing that uses MPC-friendly operations and novel algorithms to achieve secure, fast performance.

Key contributions

Uses MPC-friendly operations to significantly speed up encoder inference for privacy-preserving routing.
Employs a multi-step training algorithm to maintain high routing quality in encrypted environments.
Introduces an O(1) communication complexity unsorted Top-k algorithm for secure model search.

Why it matters

LLM routing is crucial for balancing performance and cost but introduces significant privacy risks. Existing cryptographic solutions are often too slow for practical use. This paper offers a framework that effectively mitigates these privacy concerns while achieving substantial speedups over naive implementations.

Original Abstract

Large language model (LLM) routing has emerged as a critical strategy to balance model performance and cost-efficiency by dynamically selecting services from various model providers. However, LLM routing adds an intermediate layer between users and LLMs, creating new privacy risks to user data. These privacy risks have not been systematically studied. Although cryptographic techniques such as Secure Multi-Party Computation (MPC) enable privacy-preserving computation, their protocol design and implementation remain under-explored, and naïve implementations typically incur prohibitive computational overhead. To address this, we propose a privacy-preserving LLM routing framework (PPRoute). PPRoute includes multiple strategies to speed up encoder inference and nearest neighbor search under the MPC and maintain the quality of LLM routing. First, PPRoute uses MPC-friendly operations to boost the encoder inference. Second, PPRoute uses a multiple-step model training algorithm to maintain routing quality despite the constraints of the encrypted domain. Third, PPRoute proposes an unsorted Top-k algorithm with $O(1)$ communication complexity for secure sorting in model search, significantly reducing communication latency. Across different datasets, PPRoute achieves the performance of plaintext counterparts, while achieving approximately a 20$\times$ speedup over naïve MPC implementations.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers