ArXiv TLDR

FRIGID: Scaling Diffusion-Based Molecular Generation from Mass Spectra at Training and Inference Time

🐦 Tweet
2604.16648

Montgomery Bohde, Hongxuan Liu, Mrunali Manjrekar, Magdalena Lederbauer, Shuiwang Ji + 2 more

cs.LGq-bio.QM

TLDR

FRIGID is a diffusion model that generates molecular structures from mass spectra, achieving state-of-the-art accuracy through novel training and inference-time scaling.

Key contributions

  • FRIGID: a diffusion model generating molecules from mass spectra using fingerprints and chemical formulae.
  • Scales training to hundreds of millions of unlabeled structures for broad applicability.
  • Introduces inference-time scaling via forward fragmentation for refining inconsistent fragments.
  • Achieves 18%+ Top-1 accuracy on MassSpecGym and triples leading methods on NPLIB1.

Why it matters

This paper introduces FRIGID, a significant advancement in de novo molecular structural elucidation from mass spectra. Its novel training and inference-time scaling methods dramatically improve accuracy, setting new benchmarks. The log-linear performance scaling suggests a promising path for future improvements in drug discovery and chemical analysis.

Original Abstract

In this work, we present FRIGID, a framework with a novel diffusion language model that generates molecular structures conditioned on mass spectra via intermediate fingerprint representations and determined chemical formulae, training at the scale of hundreds of millions of unlabeled structures. We then demonstrate how forward fragmentation models enable inference-time scaling by identifying spectrum-inconsistent fragments and refining them through targeted remasking and denoising. While FRIGID already achieves strong performance with its diffusion base, inference-time scaling significantly improves its accuracy, surpassing 18% Top-1 accuracy on the challenging MassSpecGym benchmark and tripling the Top-1 accuracy of the leading methods on NPLIB1. Further empirical analyses show that FRIGID exhibits log-linear performance scaling with increasing inference-time compute, opening a promising new direction for continued improvements in de novo structural elucidation. FRIGID code is publicly available at https://github.com/coleygroup/FRIGID

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.