ArXiv TLDR

Predicting Activity Cliffs for Autonomous Medicinal Chemistry

🐦 Tweet
2604.07560

Michael Cuccarese

q-bio.QMcs.LG

TLDR

A new ML model predicts activity cliff positions in medicinal chemistry, reducing experimental exploration by 31% across diverse targets.

Key contributions

  • Distinguishes between general positional variation and true activity cliffs using SALI normalization.
  • Introduces an 11-feature ML model with 3D pharmacophore context for predicting true activity cliffs.
  • Model generalizes well across diverse protein families, novel scaffolds, and temporal data splits.
  • Reduces experimental exploration by 31% by identifying cliff-prone positions twice as fast.

Why it matters

Activity cliffs are critical in drug discovery, yet predicting them has been challenging. This work provides a robust ML solution that significantly reduces experimental effort for medicinal chemists. It offers a practical, generalizable tool for autonomous drug design.

Original Abstract

Activity cliff prediction - identifying positions where small structural changes cause large potency shifts - has been a persistent challenge in computational medicinal chemistry. This work focuses on a parsimonious definition: which small modifications, at which positions, confer the highest probability of an outcome change. Position-level sensitivity is calculated using 25 million matched molecular pairs from 50 ChEMBL targets across six protein families, revealing that two questions have fundamentally different answers. "Which positions vary most?" is answered by scaffold size alone (NDCG@3 = 0.966), requiring no machine learning. "Which are true activity cliffs?" - where small modifications cause disproportionately large effects, as captured by SALI normalization - requires an 11-feature model with 3D pharmacophore context (NDCG@3 = 0.910 vs. 0.839 random), generalizing across all six protein families, novel scaffolds (0.913), and temporal splits (0.878). The model identifies the cliff-prone position first 53% of the time (vs. 27% random - 2x lift), reducing positions a chemist must explore from 3.1 to 2.1 - a 31% reduction in first-round experiments. Predicting which modification to make is not tractable from structure alone (Spearman 0.268, collapsing to -0.31 on novel scaffolds). The system is released as open-source code and an interactive webapp.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.