ArXiv TLDR

Shields to Guarantee Probabilistic Safety in MDPs

🐦 Tweet
2605.10888

Linus Heck, Filip Macák, Roman Andriushchenko, Milan Češka, Sebastian Junges

cs.LOcs.AI

TLDR

This paper extends classical safety shields to guarantee probabilistic safety in Markov Decision Processes, introducing new constructions.

Key contributions

  • Introduces a formal framework extending classical shields for probabilistic safety in MDPs.
  • Shows strong safety and permissiveness guarantees are impossible with probabilistic shields.
  • Presents new offline and online shield constructions ensuring strong probabilistic safety.
  • Empirically validates the practical advantages and computational feasibility of new shields.

Why it matters

This paper addresses the complex challenge of ensuring probabilistic safety in autonomous agents, a more realistic goal than absolute safety. It provides a formal framework and practical shield constructions, advancing the design of safer, more reliable AI systems.

Original Abstract

Shielding is a prominent model-based technique to ensure safety of autonomous agents. Classical shielding aims to ensure that nothing bad ever happens and comes with strong guarantees about safety and maximal permissiveness. However, shielding systems for probabilistic safety, where something bad is allowed to happen with an acceptable probability, has proven to be more intricate. This paper presents a formal framework that conservatively extends classical shields to probabilistic safety. In this framework, we (i) demonstrate the impossibility of preserving the strong guarantees on safety and permissiveness, (ii) provide natural shields with weaker guarantees, and (iii) introduce offline and online shield constructions ensuring strong safety guarantees. The empirical evaluation highlights the practical advantages of the new shields, as well as their computational feasibility.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.