ArXiv TLDR

Co-Generative De Novo Functional Protein Design

🐦 Tweet
2605.00948

Xinrui Chen, Yizhen Luo, Siqi Fan, Zaiqing Nie

q-bio.QMcs.AI

TLDR

CodeFP is a co-generative protein language model that designs functional proteins by simultaneously decoding sequence and structure, improving functionality and foldability.

Key contributions

  • Introduces CodeFP, a co-generative protein language model for de novo functional protein design.
  • Simultaneously decodes sequence and structure tokens to achieve superior functionality and foldability.
  • Employs functional local structures and auxiliary supervision to enhance encoding and reduce training ambiguity.
  • Achieves 6.1% higher functional consistency and 3.2% better foldability compared to strongest baselines.

Why it matters

This paper tackles the critical challenge of designing functional proteins that also fold correctly, a limitation of current methods. CodeFP's co-generative model significantly advances de novo protein design by ensuring both functionality and foldability. This breakthrough could accelerate the development of novel proteins for various biotechnological and medical applications.

Original Abstract

De novo functional protein design aims to generate protein sequences that realize specified biochemical functions without relying on evolutionary templates, enabling broad applications in biotechnology and medicine. Existing approaches adopt either direct function-to-sequence mapping or decoupled structure-sequence generation strategies but often fail to achieve functionality and foldability simultaneously. To address this, we propose CodeFP, a Co-generative protein language model for de novo Functional Protein design that simultaneously decodes sequence and structure tokens, thereby enabling superior simultaneous realization of functionality and foldability. CodeFP utilizes functional local structures to enrich functional semantic encodings, overcoming the suboptimal translation of flat encodings into structure tokens, while introducing auxiliary functional supervision to alleviate training ambiguity stemming from the one-to-many structure-to-token mapping. Extensive experiments show that CodeFP consistently achieves average improvements of 6.1% in functional consistency and 3.2% in foldability over the strongest baseline.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.