ECLIPSE: A Composable Pipeline for Predicting ecDNA Formation, Evolution, and Therapeutic Vulnerabilities in Cancer
TLDR
ECLIPSE is a robust computational framework for predicting ecDNA formation, evolution, and therapeutic vulnerabilities in cancer, addressing prior methodological flaws.
Key contributions
- ecDNA-Former predicts ecDNA status using standard genomic features, achieving AUROC 0.812 without specialized sequencing.
- CircularODE models ecDNA dynamics with physics-constrained neural SDEs, showing r > 0.997 on experimental data.
- VulnCausal identifies therapeutic vulnerabilities via causal inference, achieving 80x enrichment over chance.
Why it matters
ecDNA drives tumor evolution and therapy resistance in aggressive cancers, yet prior computational methods were flawed. ECLIPSE provides a methodologically sound framework, setting new rigorous baselines for ecDNA analysis and emphasizing rigor over architectural innovation in biomedical ML.
Original Abstract
Extrachromosomal DNA (ecDNA) represents one of the most pressing challenges in cancer biology: circular DNA structures that amplify oncogenes, evade targeted therapies, and drive tumor evolution in ~30% of aggressive cancers. Despite its clinical importance, computational ecDNA research has been built on broken foundations. We discover that existing benchmarks suffer from circular reasoning -- models trained on features that already require knowing ecDNA status -- artificially inflating performance from AUROC 0.724 to 0.967. We introduce ECLIPSE, the first methodologically sound framework for ecDNA analysis, comprising three modules that transform how we predict, model, and target these structures. ecDNA-Former achieves AUROC 0.812 using only standard genomic features, demonstrating for the first time that ecDNA status is predictable without specialized sequencing, and that careful feature curation matters more than complex architectures. CircularODE captures ecDNA's unique stochastic dynamics through physics-constrained neural SDEs, achieving r > 0.997 on experimental data via zero-shot transfer. VulnCausal applies causal inference to identify therapeutic vulnerabilities, achieving 80x enrichment over chance and 3.7x higher validation than standard approaches by filtering spurious correlations. Together, these modules establish rigorous baselines for an emerging application area and reveal a broader lesson: in high-stakes biomedical ML, methodological rigor -- eliminating leakage, encoding domain physics, addressing confounding -- outweighs architectural innovation. ECLIPSE provides both the tools and the template for principled computational oncology.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.