The fitness landscape of overlapping genes
Orson Kirsch, Nicole Wood, Steven A Redford, Kabir Husain
TLDR
This paper explores the fitness landscape of overlapping genes, revealing compatibility, trade-offs, and the natural genetic code's unique suitability.
Key contributions
- Developed a computational method to design de novo sequences for two overlapping proteins.
- Identified a fundamental trade-off and a simple criterion for overlapping gene feasibility.
- Showed widespread compatibility between protein families, with specific reading frames being more permissive.
- Demonstrated the natural genetic code's unique suitability for supporting overlapping genes.
Why it matters
This paper clarifies the evolutionary mechanisms and design principles of overlapping genes. It reveals protein fitness landscape flexibility and the natural genetic code's unique suitability, advancing our understanding of genome evolution and protein design.
Original Abstract
Natural genomes sometimes encode two different proteins in staggered reading frames of the same DNA sequence. Despite the prevalence of these 'overlapping genes' across the tree of life, it remains unknown whether arbitrary protein pairs can overlap, to what extent such overlaps are feasible, or what design principles govern them. Here, we study compatibility, frustration, and connectivity in the fitness landscape of overlapping genes. We computationally design sequences de novo that satisfy the dual functional constraints of two distinct protein families. The joint fitness landscape, inferred via Potts models from multiple sequence alignments, reveals a fundamental trade-off between the two proteins and provides a simple criterion for when overlap is feasible. We find widespread compatibility between protein families, with one class of reading frames markedly more permissible than others. By exploring alternative genetic codes, we find that the natural genetic code is uniquely well-suited to support overlapping genes. Constructing mutational paths between sequences, we find that sequence-diverse overlapped genes can be connected via a network of near-neutral mutations. Overall, our results suggest that protein fitness landscapes are sufficiently flexible so as to accommodate the stringent, orthogonal requirements of overlapping genes.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.