Implementation and Privacy Guarantees for Scalable Keyword Search on SOLID-based Decentralized Data with Granular Visibility Constraints

April 23, 20262604.22100

Mohamed Ragab, Faria Ferooz, Mohammad Bahrani, Helen Oliver, Thanassis Tiropanis + 3 more

cs.DBcs.IR

TLDR

ESPRESSO enables scalable, privacy-preserving keyword search across decentralized Solid pods while respecting user visibility policies.

Key contributions

Introduces ESPRESSO, a framework for scalable keyword search on decentralized Solid pods.
Employs WebID-scoped indexes and privacy-aware metadata for efficient search.
Presents a formal threat model analyzing metadata leakage and inference risks.
Derives design principles to enhance privacy and mitigate unauthorized inference.

Why it matters

Decentralized data ecosystems like Solid give users control, but complicate search while maintaining privacy. This paper provides a solution and a robust privacy analysis. It offers a crucial foundation for building secure and scalable decentralized data systems.

Original Abstract

In decentralized personal data ecosystems grounded in architectures such as Solid, users retain sovereignty over their data via personal online data stores (pods), hosted on Solid-compliant server infrastructures. In such environments, data remains under the control of pod owners, which complicates search due to distribution across numerous pods and user-specific access constraints. ESPRESSO is a decentralized framework for scalable keyword-based search across distributed Solid pods under user-defined visibility policies. It addresses key challenges of decentralized search by constructing WebID-scoped indexes within pods and employing privacy-aware metadata to enable efficient source selection and ranking across servers. This paper further introduces a formal threat model for ESPRESSO, analysing the security and privacy risks associated with the generation, aggregation, and use of indexes and metadata. These risks include unintended metadata leakage and the potential for adversaries to infer sensitive information about data that resides within personal data stores. The analysis identifies key design principles that limit metadata exposure while mitigating unauthorized inference. The proposed threat model provides a foundation for evaluating privacy-preserving decentralized search and informs the design of systems with stronger privacy guarantees.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers