A Systematic Survey and Benchmark of Deep Learning for Molecular Property Prediction in the Foundation Model Era
Zongru Li, Xingsheng Chen, Honggang Wen, Regina Qianru Zhang, Ming Li + 6 more
TLDR
This paper surveys and benchmarks deep learning for molecular property prediction, covering paradigms from quantum to foundation models.
Key contributions
- Presents a unified taxonomy for molecular representations and deep learning models.
- Benchmarks DL models for molecular property prediction across diverse datasets.
- Highlights challenges in current data curation, splitting, and evaluation protocols.
- Proposes three future directions for robust, trustworthy molecular AI.
Why it matters
This survey provides a comprehensive overview of deep learning in molecular property prediction, crucial for drug discovery and materials science. It identifies key challenges and proposes future research directions to advance the field.
Original Abstract
Molecular property prediction integrates quantum chemistry, cheminformatics, and deep learning to connect molecular structure with physicochemical and biological behavior. This survey traces four complementary paradigms, including Quantum, Descriptor Machine Learning, Geometric Deep Learning, and Foundation Models, and outlines a unified taxonomy linking molecular representations, model architectures, and interdisciplinary applications. Benchmark analyses integrate evidence from both widely used datasets and datasets reflecting industry perspectives, encompassing quantum, physicochemical, physiological, and biophysical domains. The survey examines current standards in data curation, splitting strategies, and evaluation protocols, highlighting challenges including inconsistent stereochemistry, heterogeneous assay sources, and reproducibility limitations under random or poorly defined splits. These observations motivate the modernization of benchmark design toward more transparent, time- and scaffold-aware methodologies. We further propose three forward-looking directions: (i) physics-aware learning embedding quantum consistency, (ii) uncertainty-calibrated foundation models for trustworthy inference, and (iii) realistic multimodal benchmark ecosystems integrating computational and experimental data. Repository: https://github.com/Zongru-Li/Survey-and-Benchmarks-of-DL-for-Molecular-Property-Prediction-in-the-Foundation-Model-Era.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.