ArXiv TLDR

LLARS: Enabling Domain Expert & Developer Collaboration for LLM Prompting, Generation and Evaluation

🐦 Tweet
2605.10593

Philipp Steigerwald, Mara Stieler, Jennifer Burghardt, Eric Rudolph, Jens Albrecht

cs.AIcs.CLcs.HCcs.SE

TLDR

LLARS is an open-source platform enabling domain experts and developers to collaboratively engineer, generate, and evaluate LLM outputs efficiently.

Key contributions

  • Real-time collaborative prompt engineering with version control and instant LLM testing.
  • Configurable batch generation of LLM outputs across prompts, models, and data, with cost control.
  • Hybrid evaluation combining human and LLM assessments, featuring live agreement metrics.
  • Seamless integration of modules for an end-to-end LLM development and evaluation pipeline.

Why it matters

LLARS bridges the collaboration gap for LLM projects between domain experts and developers. It integrates prompt engineering, generation, and evaluation, saving significant time and accelerating high-quality LLM application deployment.

Original Abstract

We demonstrate LLARS (LLM Assisted Research System), an open-source platform that bridges the gap between domain experts and developers for building LLM-based systems. It integrates three tightly connected modules into an end-to-end pipeline: Collaborative Prompt Engineering for real-time co-authoring with version control and instant LLM testing, Batch Generation for configurable output production across user-selected prompts $\times$ models $\times$ data with cost control, and Hybrid Evaluation where human and LLM evaluators jointly assess outputs through diverse assessment methods, with live agreement metrics and provenance analysis to identify the best model-prompt combination for a given use case. New prompts and models are automatically available for batch generation and completed batches can be turned into evaluation scenarios with a single click. Interviews with six domain experts and three developers in online counselling confirmed that LLARS feels intuitive, saves considerable time by keeping everything in one place and makes interdisciplinary collaboration seamless.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.