A Systematic Survey and Benchmark of Deep Learning for Molecular Property Prediction in the Foundation Model Era

April 17, 20262604.16586

Zongru Li, Xingsheng Chen, Honggang Wen, Regina Qianru Zhang, Ming Li + 6 more

cs.LGcs.AIq-bio.QM

TLDR

This paper surveys and benchmarks deep learning for molecular property prediction, covering paradigms from quantum to foundation models.

Key contributions

Presents a unified taxonomy for molecular representations and deep learning models.
Benchmarks DL models for molecular property prediction across diverse datasets.
Highlights challenges in current data curation, splitting, and evaluation protocols.
Proposes three future directions for robust, trustworthy molecular AI.

Why it matters

This survey provides a comprehensive overview of deep learning in molecular property prediction, crucial for drug discovery and materials science. It identifies key challenges and proposes future research directions to advance the field.

Original Abstract

Molecular property prediction integrates quantum chemistry, cheminformatics, and deep learning to connect molecular structure with physicochemical and biological behavior. This survey traces four complementary paradigms, including Quantum, Descriptor Machine Learning, Geometric Deep Learning, and Foundation Models, and outlines a unified taxonomy linking molecular representations, model architectures, and interdisciplinary applications. Benchmark analyses integrate evidence from both widely used datasets and datasets reflecting industry perspectives, encompassing quantum, physicochemical, physiological, and biophysical domains. The survey examines current standards in data curation, splitting strategies, and evaluation protocols, highlighting challenges including inconsistent stereochemistry, heterogeneous assay sources, and reproducibility limitations under random or poorly defined splits. These observations motivate the modernization of benchmark design toward more transparent, time- and scaffold-aware methodologies. We further propose three forward-looking directions: (i) physics-aware learning embedding quantum consistency, (ii) uncertainty-calibrated foundation models for trustworthy inference, and (iii) realistic multimodal benchmark ecosystems integrating computational and experimental data. Repository: https://github.com/Zongru-Li/Survey-and-Benchmarks-of-DL-for-Molecular-Property-Prediction-in-the-Foundation-Model-Era.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers