Tabular foundation models for in-context prediction of molecular properties

April 17, 20262604.16123

Karim K. Ben Hicham, Jan G. Rittig, Martin Grohe, Alexander Mitsos

cs.LGphysics.chem-ph

TLDR

Tabular foundation models (TFMs) enable accurate, cost-efficient in-context prediction of molecular properties, outperforming fine-tuning on small datasets.

Key contributions

TFMs predict molecular properties via in-context learning, eliminating task-specific fine-tuning.
Showed excellent predictive performance and reduced computational cost compared to fine-tuning.
Achieved up to 100% win rates on 30 MoleculeACE tasks when combined with CheMeleon embeddings.
Molecular foundation model embeddings and 2D descriptors significantly boost TFM performance.

Why it matters

This paper introduces Tabular Foundation Models (TFMs) as a superior method for molecular property prediction, especially with small datasets. TFMs leverage in-context learning, reducing computational cost and expertise needed compared to traditional fine-tuning. This offers a highly accurate and efficient alternative for drug discovery and chemical engineering.

Original Abstract

Accurate molecular property prediction is central to drug discovery, catalysis, and process design, yet real-world applications are often limited by small datasets. Molecular foundation models provide a promising direction by learning transferable molecular representations; however, they typically involve task-specific fine-tuning, require machine learning expertise, and often fail to outperform classical baselines. Tabular foundation models (TFMs) offer a fundamentally different paradigm: they perform predictions through in-context learning, enabling inference without task-specific training. Here, we evaluate TFMs in the low- to medium-data regime across both standardized pharmaceutical benchmarks and chemical engineering datasets. We evaluate both frozen molecular foundation model representations, as well as classical descriptors and fingerprints. Across the benchmarks, the approach shows excellent predictive performance while reducing computational cost, compared to fine-tuning, with these advantages also transferring to practical engineering data settings. In particular, combining TFMs with CheMeleon embeddings yields up to 100\% win rates on 30 MoleculeACE tasks, while compact RDKit2d and Mordred descriptors provide strong descriptor-based alternatives. Molecular representation emerges as a key determinant in TFM performance, with molecular foundation model embeddings and 2D descriptor sets both providing substantial gains over classic molecular fingerprints on many tasks. These results suggest that in-context learning with TFMs provides a highly accurate and cost-efficient alternative for property prediction in practical applications.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers