Learning from AVA: Early Lessons from a Curated and Trustworthy Generative AI for Policy and Development Research

April 20, 20262604.17843

Nimisha Karnatak, Mohamad Chatila, Daniel Alejandro Pinzón Hernández, Reza Yazdanfar, Michelle Dugas + 1 more

cs.HCcs.AI

TLDR

AVA is a trustworthy GenAI platform for policy research, using curated data and 'epistemic humility' to provide verifiable, evidence-based insights and save users time.

Key contributions

AVA is a GenAI platform for policy research, leveraging 4,000+ World Bank reports.
It ensures trustworthiness via citation verifiability and reasoned abstention.
Evaluation showed users saved 2.4-3.9 hours weekly, using AVA as an "evidence engine."
Contributes design guidelines for specialized AI and "ecosystem-aware" Humble AI.

Why it matters

General-purpose LLMs pose misinformation risks for policy experts. This paper introduces AVA, a specialized GenAI platform that provides verifiable, evidence-based insights from curated data. It demonstrates how 'epistemic humility' in AI can build trust, save significant time, and offer a model for future trustworthy AI systems.

Original Abstract

General-purpose LLMs pose misinformation risks for development and policy experts, lacking epistemic humility for verifiable outputs. We present AVA (AI + Verified Analysis), a GenAI platform built on a curated library of over 4,000 World Bank Reports with multilingual capabilities. AVA's multi-agent pipeline enables users to query and receive evidence-based syntheses. It operationalizes epistemic humility through two mechanisms: citation verifiability (tracing claims to sources) and reasoned abstention (declining unsupported queries with justification and redirection). We conducted an in-the-wild evaluation with over 2,200 individuals from heterogeneous organisations and roles in 116 countries, via log analysis, surveys, and 20 interviews. Difference-in-Differences estimates associate sustained engagement with 2.4-3.9 hours saved weekly. Qualitatively, participants used AVA as a specialized "evidence engine"; reasoned abstention clarified scope boundaries, and trust was calibrated through institutional provenance and page-anchored citations. We contribute design guidelines for specialized AI and articulate a vision for "ecosystem-aware" Humble AI.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers