TRACE: A Metrologically-Grounded Engineering Framework for Trustworthy Agentic AI Systems in Operationally Critical Domains
TLDR
TRACE is a new engineering framework for trustworthy agentic AI in critical domains, featuring a layered architecture, metrological trust metrics, and a parsimony principle.
Key contributions
- Introduces TRACE, a four-layer engineering framework for trustworthy agentic AI in critical domains.
- Features an explicit classical-ML vs. LLM-validator split (L2a/L2b) for deliberate LLM integration.
- Incorporates a metrologically grounded trust-metric suite aligned with GUM/VIM/ISO 17025.
- Presents the Computational Parsimony Ratio (CPR) as a new design principle for model parsimony.
Why it matters
This paper introduces a robust framework for building trustworthy AI agents in critical sectors. It provides a structured approach with measurable trust metrics and a principle for model parsimony. This ensures deliberate, quantifiable design decisions for AI, especially concerning LLMs.
Original Abstract
We introduce TRACE, a cross-domain engineering framework for trustworthy agentic AI in operationally critical domains. TRACE combines a four-layer reference architecture with an explicit classical-ML vs. LLM-validator split (L2a/L2b), a stateful orchestration-and-escalation policy (L3), and bounded human supervision (L4); a metrologically grounded trust-metric suite mapped to GUM/VIM/ISO 17025; and a Model-Parsimony principle quantified by the Computational Parsimony Ratio (CPR). Three instantiations--clinical decision support, industrial multi-domain operations, and a judicial AI assistant--transfer the samearchitecture and metrics across principally different governance contexts. The L2a/L2b separation makes the use of large language models a deliberate design decision rather than an architectural default, with parsimony quantified through CPR. TRACE introduces CPR as a first-class design principle in trustworthy-AI engineering.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.