ArXiv TLDR

Bounded Autonomy for Enterprise AI: Typed Action Contracts and Consumer-Side Execution

🐦 Tweet
2604.14723

Sarmad Sohail, Ghufran Haider

cs.SEcs.AI

TLDR

A new bounded-autonomy architecture enables safe and effective use of large language models as enterprise system operators, preventing costly errors.

Key contributions

  • Introduces a bounded-autonomy architecture using typed action contracts and consumer-side execution for LLM safety.
  • Constraints include permission-aware capabilities, scoped context, validation before side effects, and human approval.
  • Achieved 23/25 tasks with zero unsafe executions in trials, significantly safer than unconstrained AI (17/25).
  • Delivered 13-18x speedup over manual operation, with validation feedback guiding model corrections.

Why it matters

This paper presents a practical, deployed architecture that makes imperfect LLMs operationally useful and safe in enterprise systems. It addresses critical safety concerns, preventing costly errors while still leveraging AI for significant speedups. This approach is crucial for broader AI adoption in sensitive business environments.

Original Abstract

Large language models are increasingly used as natural-language interfaces to enterprise software, but their direct use as system operators remains unsafe. Model errors can propagate into unauthorized actions, malformed requests, cross-workspace execution, and other costly failures. We argue this is primarily an execution architecture problem. We present a bounded-autonomy architecture in which language models may interpret intent and propose actions, but all executable behavior is constrained by typed action contracts, permission-aware capability exposure, scoped context, validation before side effects, consumer-side execution boundaries, and optional human approval. The enterprise application remains the source of truth for business logic and authorization, while the orchestration engine operates over an explicit published actions manifest. We evaluate the architecture in a deployed multi-tenant enterprise application across three conditions: manual operation, unconstrained AI with safety layers disabled, and full bounded autonomy. Across 25 scenario trials spanning seven failure families, the bounded-autonomy system completed 23 of 25 tasks with zero unsafe executions, while the unconstrained configuration completed only 17 of 25. Two wrong-entity mutations escaped all consumer-contributed layers; only disambiguation and confirmation mechanisms intercept this class. Both AI conditions delivered 13-18x speedup over manual operation. Critically, removing safety layers made the system less useful: structured validation feedback guided the model to correct outcomes in fewer turns, while the unconstrained system hallucinated success. Several safety properties are structurally enforced by code and intercepted all targeted violations regardless of model output. The result is a practical, deployed architecture for making imperfect language models operationally useful in enterprise systems.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.