CAGenMol: Condition-Aware Diffusion Language Model for Goal-Directed Molecular Generation

April 13, 20262604.11483

Yanting Li, Zhuoyang Jiang, Enyan Dai, Lei Wang, Wen-Cai Ye + 1 more

cs.LGq-bio.QM

TLDR

CAGenMol is a condition-aware discrete diffusion language model that uses RL for goal-directed molecular generation, handling complex constraints and non-differentiable objectives.

Key contributions

Introduces CAGenMol, a condition-aware discrete diffusion framework for molecular generation.
Couples discrete diffusion with reinforcement learning to optimize non-differentiable objectives.
Effectively handles heterogeneous constraints and conflicting objectives in molecular design.
Achieves state-of-the-art results in binding affinity, drug-likeness, and success rates.

Why it matters

This paper introduces a novel approach to goal-directed molecular generation, addressing key limitations of existing methods. By combining discrete diffusion with reinforcement learning, CAGenMol can navigate complex chemical spaces and optimize multiple, often conflicting, drug properties. This advancement could accelerate drug discovery by generating more effective and valid molecules.

Original Abstract

Goal-directed molecular generation requires satisfying heterogeneous constraints such as protein--ligand compatibility and multi-objective drug-like properties, yet existing methods often optimize these constraints in isolation, failing to reconcile conflicting objectives (e.g., affinity vs. safety), and struggle to navigate the non-differentiable chemical space without compromising structural validity. To address these challenges, we propose CAGenMol, a condition-aware discrete diffusion framework over molecular sequences that formulates molecular design as conditional denoising guided by heterogeneous structural and property signals. By coupling discrete diffusion with reinforcement learning, the model aligns the generation trajectory with non-differentiable objectives while preserving chemical validity and diversity. The non-autoregressive nature of diffusion language model further enables iterative refinement of molecular fragments at inference time. Experiments on structure-conditioned, property-conditioned, and dual-conditioned benchmarks demonstrate consistent improvements over state-of-the-art methods in binding affinity, drug-likeness, and success rate, highlighting the effectiveness of our framework.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers