Risk-Controlled Post-Processing of Decision Policies
Sunay Joshi, Tao Wang, Hamed Hassani, Edgar Dobriban
TLDR
This paper introduces risk-controlled post-processing for decision policies, maximizing agreement with baselines under specified risk constraints.
Key contributions
- Proposes risk-controlled post-processing to balance agreement with baseline policies and loss constraints.
- Shows optimal policy has a threshold structure, switching to a fallback only for high-risk contexts.
- Develops an algorithm for finite-sample settings with theoretical guarantees on excess risk ($O(\log n/n)$).
- Demonstrates precise risk control and improved baseline agreement over random mixing in experiments.
Why it matters
This work addresses a critical challenge in deploying predictive models: integrating them with existing decision policies while rigorously controlling risk. By providing a theoretically sound and empirically validated post-processing method, it offers a practical solution for stakeholders to adopt new models safely and effectively.
Original Abstract
Predictive models are often deployed through existing decision policies that stakeholders are reluctant to change unless a risk constraint requires intervention. We study risk-controlled post-processing: given a deterministic baseline policy, choose a new policy that maximizes agreement with the baseline subject to a chance constraint on a user-specified loss. At the population level, we show that the optimal policy has a threshold structure: it follows the baseline except on contexts where switching to the oracle fallback policy yields a large reduction in conditional violation risk. At the finite-sample level, given a fitted fallback policy and score, we develop a post-processing algorithm that uses calibration data to select a threshold. Leveraging tools from algorithmic stability and stochastic processes, we show that under regularity conditions, in the i.i.d. setting, the expected excess risk of the post-processed policy is $O(\log n/n)$. In the special case when an exact-safe fallback policy is available, the algorithm achieves precise expected risk control under exchangeability. In this setting, we also give high-probability near-optimality guarantees on the post-processed policy. Experiments on a COVID-19 radiograph diagnosis task, an LLM routing problem, and a synthetic multiclass decision task show that targeted post-processing can meet or nearly meet risk budgets while preserving substantially more agreement with the baseline than score-blind random mixing.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.