ArXiv TLDR

Combining Bayesian and Frequentist Inference for Laboratory-Specific Performance Guarantees in Copy Number Variation Detection

🐦 Tweet
2604.14305

Austin Talbot, Alex V. Kotlar, Yue Ke

stat.MEcs.LGq-bio.GNstat.AP

TLDR

A hybrid Bayesian-frequentist method provides accurate, lab-specific performance guarantees for copy number variation detection in oncology panels.

Key contributions

  • Proposes a hybrid framework for frequentist performance guarantees from Bayesian CNV callers.
  • Introduces imputation to remove true CNV influence without requiring known ground truth.
  • Uses regularization and evidence-based stratification for robust, real-world application.
  • Achieves single-digit coverage error, vastly outperforming existing Bayesian methods.

Why it matters

This paper addresses a critical gap in oncology diagnostics by enabling reliable, lab-specific performance guarantees for CNV detection. It provides a robust method for clinical validation, ensuring accurate and trustworthy results for targeted amplicon panels. This is crucial for patient care.

Original Abstract

Targeted amplicon panels are widely used in oncology diagnostics, but providing per-gene performance guarantees for copy number variant (CNV) detection remains challenging due to amplification artifacts, process-mismatch heterogeneity, and limited validation sample sizes. While Bayesian CNV callers naturally quantify per-sample uncertainty, translating this into the frequentist population-level guarantees required for clinical validation, coverage rates, false-positive bounds, and minimum detectable copy-number changes, is a fundamentally different inferential problem. We show empirically that even robust Bayesian credible intervals, including coarsened posteriors and sandwich-adjusted intervals, are severely miscalibrated on panels with small amplicon counts per gene. To address this, we propose a hybrid framework that evaluates Bayesian posterior functionals on validation samples and models the resulting squared losses with a Gamma distribution, yielding tolerance intervals with valid frequentist coverage. Three components make the method practical under real-world constraints: (1) imputation that removes the influence of true CNV-positive samples without requiring known ground truth, (2) regularization to address small sample variability, and (3) evidence-based stratification on the log model evidence to accommodate non-exchangeable noise profiles arising from process mismatch. Evaluated on two targeted amplicon panels using leave-one-out cross-validation, the proposed method achieves single-digit mean absolute coverage error across all genes under both process-matched and unmatched conditions, whereas Bayesian comparators exhibit mean absolute errors exceeding 60\% on clinically relevant genes such as ERBB2.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.