ProDock: From multi-target consensus docking into database-backed storage
Tieu-Long Phan, Lai Hoang Son Le, Thanh-An Pham, Nhu-Ngoc Nguyen Song, Tuyet-Minh Phan + 1 more
TLDR
ProDock is an open-source Python toolkit for reproducible protein-ligand docking, streamlining workflows and storing results in an SQLite database.
Key contributions
- Streamlines protein-ligand docking workflows, addressing common issues like fragmented scripts.
- Offers an open-source Python toolkit for reproducible docking, postprocessing, and data storage.
- Organizes docking into four layers: preprocessing, execution, postprocessing, and SQLite-backed storage.
- Converts diverse, engine-specific outputs into structured, comparable analytical results.
Why it matters
Protein-ligand docking workflows are often fragmented, leading to reproducibility issues and complex analysis. ProDock solves this by offering a unified, open-source Python toolkit for managing docking campaigns from preprocessing to database storage. This enhances reproducibility, simplifies comparative analysis, and ensures better data provenance.
Original Abstract
Protein--ligand docking is widely used in structure-based discovery, but routine studies often fail at the workflow level rather than at the scoring level. Receptor cleaning, ligand preparation, file conversion, box definition, run organization, and downstream parsing are frequently handled by fragmented scripts, which reduces reproducibility, obscures provenance, and complicates comparative analysis across targets, ligands, and docking settings. We present ProDock, an open-source Python toolkit for reproducible protein--ligand docking and postprocessing. ProDock organizes application-oriented docking into four connected layers: receptor and ligand preprocessing, provenance-aware docking execution, postprocessing of poses and interaction fingerprints, and SQLite-backed storage for later querying. The package supports inputs ranging from PDB identifiers and local receptor files to \texttt{SMILES} strings and prepared ligand directories, and integrates receptor preparation, ligand preparation, reference-ligand-based box generation, campaign serialization, batch docking, pose crawling, score extraction, interaction profiling, and database insertion within a consistent project-local workflow. By representing studies as explicit many-to-many campaigns linking multiple receptors, ligands, and docking backends, ProDock converts fragmented engine-specific outputs into structured analytical results that are easier to compare, reuse, and audit. ProDock is implemented in Python and released under an open-source license at https://github.com/Medicine-Artificial-Intelligence/ProDock. Documentation is available at https://prodock.readthedocs.io/en/latest.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.