LPDP: Inference-Time Reward Control for Variable-Length DNA Generation with Edit Flows

May 12, 20262605.11368

Jeongchan Kim, Yunkyung Ko, Jong Chul Ye

cs.LGcs.AIq-bio.GN

TLDR

LPDP enables training-free, inference-time reward control for variable-length DNA generation using biologically plausible edit flows.

Key contributions

Introduces LPDP, a training-free method for inference-time reward control in DNA generation.
Generates variable-length DNA using biologically plausible insertion, deletion, and substitution edits.
Re-ranks candidate edits by solving a local discrete program around child sequences.
Demonstrated for enhancer optimization and exon-intron-exon inpainting tasks.

Why it matters

This paper tackles the limitation of fixed-length DNA generation by introducing LPDP, a training-free method for variable-length sequence control. It offers a novel approach to guide DNA synthesis at inference time, using biologically plausible edit operations. This has significant implications for designing functional DNA, like optimizing enhancers or fine-tuning splice sites.

Original Abstract

We study the application of recent Edit Flows for inference-time reward control for DNA sequence generation. Unlike most reward-guided DNA generation frameworks, which operate on fixed-length sequence spaces, Edit Flows have a potential to generate variable-length DNA through biologically plausible insertion, deletion, and substitution operations. In particular, we propose Local Perturbation Discrete Programming (LPDP), a training-free, intermediate-state and action-aware local re-solving operator for variable-length DNA edit-action generators at inference time. More specifically, at each guided rollout step, LPDP scores one-step root edits, retains a near-best root band, and re-ranks each retained root by solving a bounded local discrete program around its child sequence. This local program uses the typed geometry of edit actions to focus on coherent substitution, insertion, or deletion subgraphs, and aggregates local continuations with either a hard Max backup or a soft log-sum-exponential (LSE) backup. We instantiate LPDP in two regimes: front-loaded reward tilting for enhancer optimization, where early edits are critical for establishing global regulatory sequence structure, and back-loaded reward tilting for exon-intron-exon inpainting, where late edits fine-tune splice-boundary contexts.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers