Direct RNA sequence design under codon constraints using expressive tensor-based secondary structure models
Mark Fornace, Christina Wuyan Wang, Michael Lindsey
TLDR
A new algorithm enables direct RNA sequence design under codon constraints by using expressive tensor-based secondary structure models for accurate optimization.
Key contributions
- Introduces a direct and efficient algorithm for sampling RNA sequences from a Boltzmann distribution.
- Integrates codon sequence and a fully detailed secondary structure free energy model for global design.
- Leverages tensor-based thermodynamics for the first time in global sequence design with high accuracy.
- Provides exact computation of statistical quantities like free energies and base pairing probabilities.
Why it matters
This work addresses the major challenge of designing nucleic acid sequences for target proteins, crucial for synthetic biology and mRNA therapeutics. By enabling global sequence design with highly accurate free energy models and parallel computation, it significantly advances codon optimization.
Original Abstract
Nucleic acid sequence design via codon optimization is a fundamental task with applications across synthetic biology, mRNA therapeutics, and vaccine design. Given a target protein, it is a major open challenge to navigate the combinatorially large design space of codon sequences mapping to its amino acid sequence. Computational approaches generally seek to optimize simple objectives based on the codon sequence, possibly together with more complicated contributions based on secondary structure analysis. In this work, we demonstrate a direct and efficient algorithm to sample sequences from a suitable Boltzmann distribution defined in terms of the codon sequence and a fully detailed secondary structure free energy model, as well as related algorithms for exact computation of statistical quantities such as free energies, base pairing probabilities, and base and codon marginals. These algorithms draw upon a recently developed tensor-based formulation of secondary structure thermodynamics and demonstrate, for the first time, that global sequence design can be accomplished with respect to a highly accurate free energy model. Moreover, the algorithms can leverage any available CPU and GPU resources in parallel for massive computational speedups.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.