ArXiv TLDR

Privacy-Preserving Product-Quantized Approximate Nearest Neighbor Search Framework for Large-scale Datasets via A Hybrid of Fully Homomorphic Encryption and Trusted Execution Environment

🐦 Tweet
2604.17816

Shozo Saeki, Minoru Kawahara, Hirohisa Aman

cs.CR

TLDR

PPPQ-ANN offers a privacy-preserving ANN search for large datasets via FHE and TEE, achieving practical security and performance.

Key contributions

  • Introduces PPPQ-ANN, a privacy-preserving ANN framework for large-scale datasets.
  • Employs a FHE and TEE hybrid for multi-layered security of vectors.
  • Optimizes FHE computations using Product-Quantization and data packing.
  • Achieves practical performance: <2h database generation, >50 QPS on million-scale data.

Why it matters

Nearest-neighbor search is vital for LLMs/VLMs, but vector data poses privacy risks. Existing solutions are impractical. PPPQ-ANN offers a novel hybrid FHE/TEE approach, achieving practical privacy-preserving ANN search and database generation for large datasets, addressing a critical gap.

Original Abstract

A nearest-neighbor framework is a fundamental tool for various applications involving Large Language Models (LLMs) and Visual Language Models (VLMs). Vectors used for nearest-neighbor searches have richer information for similarity searches. This information leads to security risks, such as embedding inversion and membership attacks. Therefore, Privacy-Preserving Approximate Nearest-Neighbor (PP-ANN) approaches are necessary for highly confidential data. However, conventional PP-ANN approaches based on a Trusted Execution Environment (TEE) or Fully Homomorphic Encryption (FHE) do not achieve practical security or performance. Additionally, conventional approaches focus on the search process rather than database generation for nearest-neighbor. To address these issues, we propose a Privacy-Preserving Product-Quantization Approximate Nearest Neighbor (PPPQ-ANN) framework. PPPQ-ANN provides a multi-layered security structure for vectors based on a hybrid of FHE and TEE. Additionally, PPPQ-ANN minimizes FHE ciphertext computations by combining Product-Quantization (PQ) with optimized data packing. We demonstrate the performance of PPPQ-ANN on million-scale datasets. As a result, PPPQ-ANN achieves database generation in less than 2 hours and more than 50 QPS in a sequential search while preserving privacy. Therefore, PPPQ-ANN optimizes the trade-off between security and performance by utilizing a hybrid of FHE and TEE, achieving practical performance while preserving privacy.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.