Governing What You Cannot Observe: Adaptive Runtime Governance for Autonomous AI Agents

April 27, 20262604.24686

cs.AI

TLDR

This paper introduces a framework and system, RiskGate, for adaptively governing autonomous AI agents by predicting unobserved risks and ensuring safe operation.

Key contributions

Proposes the Informational Viability Principle to govern AI agents by estimating unobserved risk.
Introduces the Agent Viability Framework with three properties for documented agent failure modes.
Presents RiskGate, a system using statistical estimators for fail-secure agent governance.
Develops a predictive Viability Index (VI(t)) to shift governance from reactive to proactive.

Why it matters

Autonomous AI agents can become unsafe even without code changes. This paper offers a novel theoretical framework and a practical system, RiskGate, to continuously govern agents by predicting and mitigating unobserved risks, enhancing their runtime safety and reliability.

Original Abstract

Autonomous AI agents can remain fully authorized and still become unsafe as behavior drifts, adversaries adapt, and decision patterns shift without any code change. We propose the \textbf{Informational Viability Principle}: governing an agent reduces to estimating a bound on unobserved risk $\hat{B}(x) = U(x) + SB(x) + RG(x)$ and allowing an action only when its capacity $S(x)$ exceeds $\hat{B}(x)$ by a safety margin. The \textbf{Agent Viability Framework}, grounded in Aubin's viability theory, establishes three properties -- monitoring (P1), anticipation (P2), and monotonic restriction (P3) -- as individually necessary and collectively sufficient for documented failure modes. \textbf{RiskGate} instantiates the framework with dedicated statistical estimators (KL divergence, segment-vs-rest $z$-tests, sequential pattern matching), a fail-secure monotonic pipeline, and a closed-loop Autopilot formalised as an instance of Aubin's regulation map with kill-switch-as-last-resort; a scalar Viability Index $VI(t) \in [-1,+1]$ with first-order $t^*$ prediction transforms governance from reactive to predictive. Contributions are the theoretical framework, the reference implementation, and analytical coverage against published agent-failure taxonomies; quantitative empirical evaluation is scoped as follow-up work.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers