ArXiv TLDR

VeriX-Anon: A Multi-Layered Framework for Mathematically Verifiable Outsourced Target-Driven Data Anonymization

🐦 Tweet
2604.12431

Miit Daga, Swarna Priya Ramu

cs.CRcs.DBcs.LG

TLDR

VeriX-Anon is a multi-layered framework for mathematically verifiable outsourced k-anonymization, ensuring faithful execution of privacy-sensitive data transformations.

Key contributions

  • VeriX-Anon offers a multi-layered framework for mathematically verifiable outsourced target-driven k-anonymization.
  • It integrates deterministic (Merkle-style hashing), probabilistic (Sentinels, Twins), and XAI-based utility verification.
  • Detected deviations in 11 of 12 scenarios against various adversaries, with no single layer achieving this alone.
  • Client-side verification is fast (<1s for 1M rows) and preserves significantly more utility than blind anonymization.

Why it matters

This paper addresses the critical need for verifying outsourced privacy-sensitive data anonymization, a growing concern for organizations using cloud services. VeriX-Anon provides a robust, multi-layered solution that ensures faithful execution of anonymization algorithms. It enhances trust and compliance while preserving data utility, making it vital for secure data outsourcing.

Original Abstract

Organisations increasingly outsource privacy-sensitive data transformations to cloud providers, yet no practical mechanism lets the data owner verify that the contracted algorithm was faithfully executed. VeriX-Anon is a multi-layered verification framework for outsourced Target-Driven k-anonymization combining three orthogonal mechanisms: deterministic verification via Merkle-style hashing of an Authenticated Decision Tree, probabilistic verification via Boundary Sentinels near the Random Forest decision boundary and exact-duplicate Twins with cryptographic identifiers, and utility-based verification via Explainable AI fingerprinting that compares SHAP value distributions before and after anonymization using the Wasserstein distance. Evaluated on three cross-domain datasets against Lazy (drops 5 percent of records), Dumb (random splitting, fake hash), and Approximate (random splitting, valid hash) adversaries, VeriX-Anon correctly detected deviations in 11 of 12 scenarios. No single layer achieved this alone. The XAI layer was the only mechanism that caught the Approximate adversary, succeeding on Adult and Bank but failing on the severely imbalanced Diabetes dataset where class imbalance suppresses the SHAP signal, confirming the need for adaptive thresholding. An 11-point k-sweep showed Target-Driven anonymization preserves significantly more utility than Blind anonymization (Wilcoxon $p = 0.000977$, Cohen's $d = 1.96$, mean F1 gap $+0.1574$). Client-side verification completes under one second at one million rows. The threat model covers three empirically evaluated profiles and one theoretical profile (Informed Attacker) aware of trap embedding but unable to defeat the cryptographic salt. Sentinel evasion probability ranges from near-zero for balanced datasets to 0.52 for imbalanced ones, a limitation the twin layer compensates for in every tested scenario.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.