From Black-Box Confidence to Measurable Trust in Clinical AI: A Framework for Evidence, Supervision, and Staged Autonomy

April 29, 20262604.26671

Serhii Zabolotnii, Viktoriia Holinko, Olha Antonenko

cs.CLcs.AIcs.CY

TLDR

This paper proposes a framework for trustworthy clinical AI, integrating evidence, supervision, and staged autonomy to build measurable system trust.

Key contributions

Introduces a framework for trustworthy clinical AI based on evidence, supervision, and staged autonomy.
Combines deterministic clinical logic with a patient-specific AI assistant and multi-tier escalation.
Proposes a human supervision layer for verification, escalation, and risk control in clinical AI.
Defines measurable trust metrics using metrological principles like uncertainty and traceability.

Why it matters

This paper matters because it shifts the focus from subjective model accuracy to an engineered, measurable system for trust in clinical AI. It provides a practical framework with architectural components and metrics, enabling safer and more reliable integration of AI into healthcare. This approach is crucial for real-world clinical adoption.

Original Abstract

Trust in clinical artificial intelligence (AI) cannot be reduced to model accuracy, fluency of generation, or overall positive user impression. In medicine, trust must be engineered as a measurable system property grounded in evidence, supervision, and operational boundaries of AI autonomy. This article proposes a practical framework for trustworthy clinical AI built around three principles: evidence, supervision, and staged autonomy. Rather than replacing deterministic clinical logic wholesale with end-to-end black-box models, the proposed approach combines a deterministic core, a patient-specific AI assistant for contextual validation, a multi-tier model escalation mechanism, and a human supervision layer for verification, escalation, and risk control. We demonstrate that trust also depends on selective verification of clinically critical findings, bounded clinical context, disciplined prompt architecture, and careful evaluation on realistic cases. Classifier-driven modular prompting is examined as an incremental path to scaling clinical depth without sacrificing prompt performance and without waiting for complete rule-based coverage. To operationalize trust, a set of trust metrics is proposed, built on metrological principles -- measurement uncertainty, calibration, traceability -- enabling quantitative rather than subjective assessment of each architectural layer. In this perspective, trustworthy clinical AI emerges not as a property of an individual model, but as an architectural outcome of a system into which evidence trails, human oversight, tiered escalation, and graduated action rights are embedded from the outset.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers