ArXiv TLDR

Curated AI beats frontier LLMs at pharma asset discovery

🐦 Tweet
2605.04908

Łukasz Kidziński, Kevin Thomas

cs.AIq-bio.QM

TLDR

Gosset, an AI platform with curated data, significantly outperforms frontier LLMs in discovering pharma assets, especially niche and preclinical drugs.

Key contributions

  • Introduces Gosset, an AI platform leveraging curated drug-asset annotations for pharmaceutical discovery.
  • Benchmarks Gosset against 4 frontier LLMs (Claude, GPT, Gemini, Perplexity) on 10 niche oncology/immunology targets.
  • Gosset achieves 3.2x more verified drug discoveries than the best LLM, with perfect precision and 100% recall.
  • Suggests frontier LLMs can enhance recall by integrating curated indices instead of generic web search.

Why it matters

This paper demonstrates a significant leap in pharmaceutical asset discovery, particularly for niche and preclinical drugs often missed by general LLMs. By showing how curated data can drastically improve recall and precision, it highlights a critical path for enhancing AI's utility in drug development and competitive intelligence.

Original Abstract

General-purpose LLMs with web search are increasingly used to scout the competitive landscape of pharmaceutical pipelines. We benchmark Gosset -- an AI platform with a chat interface backed by curated target-, modality-, and indication-level drug-asset annotations -- against four frontier systems with web access (Claude Opus 4.7, GPT 5.5, Gemini 3.1 Pro, Perplexity sonar-pro) on ten niche oncology/immunology targets where most of the pipeline lives in the long tail of preclinical and Asian-developed assets. All five systems receive the same natural-language query and the same JSON output schema. Across 10 targets Gosset returns 3.2x more verified drugs per query than the best frontier system, at perfect precision and 100% recall against the cross-system union of verified drugs. The same curated index is exposed as a Gosset MCP server that any frontier model can call as a tool, suggesting that each of these systems can close most of the recall gap by swapping generic web search for a curated index behind the same chat interface.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.