ArXiv TLDR

Operating Within the Operational Design Domain: Zero-Shot Perception with Vision-Language Models

🐦 Tweet
2605.07649

Berkehan Ünal, Dierend Hauke, Fazlija Dren, Plachetka Christopher

cs.CVcs.AIcs.RO

TLDR

This paper demonstrates how Vision-Language Models can perform zero-shot perception of Operational Design Domain elements, enhancing safety for autonomous systems.

Key contributions

  • Empirical study of zero-shot ODD classification and detection using four VLMs on diverse datasets.
  • Ablation analysis of zero-shot optimization strategies, including a cost-performance overview.
  • Provides reusable prompting templates and guidance for adapting VLMs for ODD perception.

Why it matters

Autonomous systems must operate safely within defined conditions (ODD). This paper shows VLMs can adaptively perceive these conditions without specific training data. This enables more robust and auditable perception for safety-critical applications like automated driving.

Original Abstract

Over the last few years, research on autonomous systems has matured to such a degree that the field is increasingly well-positioned to translate research into practical, stakeholder-driven use cases across well-defined domains. However, for a wide-scale practical adoption of autonomous systems, adherence to safety regulations is crucial. Many regulations are influenced by the Operational Design Domain (ODD), which defines the specific conditions in which an autonomous agent can function. This is especially relevant for Automated Driving Systems (ADS), as a dependable perception of ODD elements is essential for safe implementation and auditing. Vision-language models (VLMs) integrate visual recognition and language reasoning, functioning without task-specific training data, which makes them suitable for adaptable ODD perception. To assess whether VLMs can function as zero-shot "ODD sensors" that adapt to evolving definitions, we contribute (i) an empirical study of zero-shot ODD classification and detection using four VLMs on a custom dataset and Mapillary Vistas, along with failure analyses; (ii) an ablation of zero-shot optimization strategies with a cost-performance overview; and (iii) a suite of reusable prompting templates with guidance for adaptation. Our findings indicate that definition-anchored chain-of-thought prompting with persona decomposition performs best, while other methods may result in reduced recall. Overall, our results pave the way for transparent and effective ODD-based perception in safety-critical applications.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.