ArXiv TLDR

Differentially Private Auditing Under Strategic Response

🐦 Tweet
2605.07674

Florian A. D. Burnat

cs.GTcs.CRcs.LG

TLDR

This paper models differentially private AI audits as a strategic game, proposing an optimal DP budget allocation algorithm to counter developer responses.

Key contributions

  • Models DP audits as a bilevel Stackelberg game where developers strategically respond.
  • Introduces the welfare-weighted under-detection gap ($B_w$) to measure audit failure.
  • Shows naive DP budget allocation leads to larger under-detection gaps than non-strategic baselines.
  • Proposes SPAD, a projected-gradient algorithm for optimal strategic private audit design.

Why it matters

AI system audits need differential privacy, but developers can strategically respond to privacy constraints. This paper offers a game-theoretic framework and an algorithm to design optimal DP audits, ensuring more effective regulatory oversight despite strategic behavior.

Original Abstract

Regulatory audits of AI systems increasingly rely on differential privacy (DP) to protect training data and model internals. We study audit design when the audited developer can strategically respond to the privacy-constrained audit interface. We formalize privacy-constrained auditing as a bilevel Stackelberg game, in which an auditor commits to a query policy and DP budget allocation across harm dimensions, and a strategic developer reallocates mitigation efforts in response. We introduce the welfare-weighted under-detection gap $B_w$, the welfare-weighted true residual harm the audit fails to detect at the developer's strategic best response, and prove that naive DP auditing (uniform or harm-proportional allocation) induces a strictly larger $B_w$ than any non-strategic mitigation baseline whenever effective detectability is heterogeneous, the welfare weights are not comonotone with detectability, and the developer's optimum is interior. We characterize the optimal auditor allocation as a four-factor balance of welfare weight, audit miss-probability, detectability elasticity, and mitigation-cost curvature, and provide a single-level reformulation of the bilevel problem via the developer's KKT system. We propose Strategic Private Audit Design (SPAD), a projected-gradient algorithm with hypergradients computed through the developer's best response.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.