Benchmarking open-source tools for in silico antiviral drug discovery

May 5, 20262605.04265

q-bio.BMq-bio.QM

TLDR

This paper benchmarks 15 open-source computational tools for antiviral drug discovery, identifying top performers and highlighting data curation.

Key contributions

Curated a custom dataset of 43,005 viral protein-ligand binding measurements, cleaning 31% of polyprotein data.
Benchmarked 15 open-source binding affinity prediction tools on 853 antiviral compounds across 16 viral targets.
Found Boltz-2 and DrugFormDTA excelled among ML models, while GNINA performed best for docking approaches.
Fine-tuning DrugFormDTA on the custom dataset improved its performance from r=0.5 to r=0.7.

Why it matters

This work provides a critical evaluation of existing computational tools for antiviral drug discovery, which is vital given the lack of FDA-approved antivirals for many viral families. By identifying top-performing models and emphasizing data quality, it accelerates the development of new platforms for rapid drug repurposing and combination design. This directly addresses the urgent need for quick deployment during outbreaks.

Original Abstract

Antivirals are uniquely positioned to be deployed quickly during a new outbreak, especially when repurposed from approved drugs. Yet there are no FDA-approved antivirals for the majority of viral families with pandemic potential. Here we lay out the case for investing in technologies and techniques for antiviral drug discovery and designing antiviral combinations. We present a survey of open source datasets and computational tools for in silico antiviral drug discovery, with a particular focus on the latest AI-based systems and docking tools. We then present our custom dataset of 43,005 viral protein-ligand binding measurements that we curated from BindingDB and other sources. Importantly, we found that 31% of viral protein binding data in BindingDB required polyprotein sequences to be carefully split before the data were suitable for training or testing ML models. Using our custom dataset we fine-tuned the DrugFormDTA binding affinity prediction model (Khokhlov et al. 2025). We then benchmarked 15 open-source binding affinity prediction tools on a custom test set of 853 antiviral compounds spread across 16 different protein targets from 10 virus species. Models tested include Boltz-2, GNINA, FlowDock, Interformer, AutoDock-GPU, and others. We found that Boltz-2 and DrugFormDTA ranked highest overall among ML-based approaches, and GNINA did best among docking approaches, with notable variance across specific viral proteins. Fine-tuning DrugFormDTA on our custom cleaned antiviral dataset boosted performance from $r=0.5$ to $r=0.7$. As part of this work we also compiled a library of approved drugs and a comprehensive list of investigational and approved antiviral drugs that can be viewed at https://antivirals-database.radvac.org. Together, this work provides a foundation for future work towards new tools and platforms for rapid drug repurposing and rapid design of antiviral combinations.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers