Beyond Fixed False Discovery Rates: Post-Hoc Conformal Selection with E-Variables

April 13, 20262604.11305

cs.LGcs.ITstat.ML

TLDR

Post-hoc Conformal Selection (PH-CS) offers adaptive false discovery rate control, letting users balance selection size and FDR based on observed data.

Key contributions

Addresses the limitation of fixed FDR levels in existing Conformal Selection methods.
Generates a path of candidate selection sets with data-driven FDP estimates.
Enables users to select an optimal operating point by maximizing a specified utility.
Provides a finite-sample post-hoc reliability guarantee using conformal e-variables.

Why it matters

Existing methods fix FDR, limiting adaptability. PH-CS provides crucial flexibility for researchers to dynamically adjust selection criteria based on observed data and available resources, making it more practical for fields like genomics and neuroimaging. This allows for more efficient and tailored candidate selection.

Original Abstract

Conformal selection (CS) uses calibration data to identify test inputs whose unobserved outcomes are likely to satisfy a pre-specified minimal quality requirement, while controlling the false discovery rate (FDR). Existing methods fix the target FDR level before observing data, which prevents the user from adapting the balance between number of selected test inputs and FDR to downstream needs and constraints based on the available data. For example, in genomics or neuroimaging, researchers often inspect the distribution of test statistics, and decide how aggressively to pursue candidates based on observed evidence strength and available follow-up resources. To address this limitation, we introduce {post-hoc CS} (PH-CS), which generates a path of candidate selection sets, each paired with a data-driven false discovery proportion (FDP) estimate. PH-CS lets the user select any operating point on this path by maximizing a user-specified utility, arbitrarily balancing selection size and FDR. Building on conformal e-variables and the e-Benjamini-Hochberg (e-BH) procedure, PH-CS is proved to provide a finite-sample post-hoc reliability guarantee whereby the ratio between estimated FDP level and true FDP is, on average, upper bounded by $1$, so that the average estimated FDP is, to first order, a valid upper bound on the true FDR. PH-CS is extended to control quality defined in terms of a general risk. Experiments on synthetic and real-world datasets demonstrate that, unlike CS, PH-CS can consistently satisfy user-imposed utility constraints while producing reliable FDP estimates and maintaining competitive FDR control.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers