oxo-call: Documentation-grounded Skill Augmentation for Accurate Bioinformatics Command-line Generation with Large Language Models
Yun Peng, Yujun Sun, Jia Ding, Bin Yan, Zhangyu Wang + 4 more
TLDR
oxo-call is a Rust-based LLM assistant that generates accurate bioinformatics command-line invocations using documentation grounding and expert skill augmentation.
Key contributions
- Translates natural language task descriptions into precise bioinformatics commands.
- Grounds LLM with complete, version-specific tool documentation for high accuracy.
- Augments LLM with >150 curated expert skills, common pitfalls, and worked examples.
- Supports reproducible research via command logging and local LLM inference for data privacy.
Why it matters
This paper introduces oxo-call, a crucial tool addressing the complexity of bioinformatics command-line tools. It significantly improves the accuracy and accessibility of genomic analysis through documentation grounding and expert skill augmentation. Its support for reproducibility and local LLM inference makes it highly practical for researchers.
Original Abstract
Command-line bioinformatics tools remain essential for genomic analysis, yet their diversity in syntax and parameterization presents a persistent barrier to productive research. We present oxo-call, a Rust-based command-line assistant that translates natural-language task descriptions into accurate tool invocations through two complementary strategies: documentation-first grounding, which provides the large language model (LLM) with the complete, version-specific help text of each target tool, and curated skill augmentation, which primes the model with domain-expert concepts, common pitfalls, and worked examples. oxo-call (v0.10) ships >150 built-in skills covering 44 analytical categories, from variant calling and genome assembly to single-cell transcriptomics, compiled into a single, statically linked binary. Every generated command is logged with provenance metadata to support reproducible research. oxo-call also provides a DAG-based workflow engine, extensibility through user-defined and community skills via the Model Context Protocol, and support for local LLM inference to address data-privacy requirements. oxo-call is freely available for academic use at https://traitome.github.io/oxo-call/.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.