Conditional Monte Carlo Tree Diffusion for Designing Cell-Type-Specific and Biologically Faithful Regulatory DNA
Animesh Awasthi, Raphael Bednarsky, Moritz Schaefer, Christoph Bock
TLDR
DNA-CRAFT uses conditional Monte Carlo tree diffusion to design highly cell-type-specific and biologically faithful regulatory DNA elements.
Key contributions
- Introduces DNA-CRAFT, a generative framework for cell-type-specific regulatory DNA design.
- Integrates class-conditioned discrete diffusion with Monte Carlo tree search.
- Trains on 3.2M ENCODE elements, learning cell-specific regulatory grammars.
- Achieves high specificity and biological fidelity, outperforming other generative models.
Why it matters
Designing precise regulatory DNA is crucial for cell engineering and gene therapy. DNA-CRAFT offers a novel approach to generate highly specific and natural-looking sequences, overcoming limitations of prior methods. This advancement could accelerate the development of targeted therapies.
Original Abstract
Designing regulatory DNA elements with precise cell-type-specific activity is broadly relevant for cell engineering and gene therapy. Deep generative models can generate functional gene-regulatory elements, but existing methods struggle to achieve high specificity against undesired cell types while adhering to the genome's natural regulatory grammar. Here, we introduce DNA-CRAFT, a generative framework that integrates class-conditioned discrete diffusion with Monte Carlo tree search to design cell-type-specific and biologically faithful regulatory elements. We first train a discrete diffusion model on the ENCODE registry of 3.2 million candidate regulatory elements. Second, we condition the model to learn class-specific regulatory grammars of naturally occurring DNA sequences, including enhancers and promoters. Third, we employ conditional Monte Carlo tree guidance, an inference-time alignment algorithm designed to maximize the differential regulatory activity between desired and undesired cell types. By benchmarking DNA-CRAFT on regulatory sequence design tasks for human cell lines and immune cell types, we demonstrate that our model generates sequences with high predicted cell-type-specific activity and biological fidelity, achieving the best trade-offs compared to methods that use diffusion, autoregressive models, and gradient-based optimization.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.