Operating Within the Operational Design Domain: Zero-Shot Perception with Vision-Language Models
Berkehan Ünal, Dierend Hauke, Fazlija Dren, Plachetka Christopher
TLDR
This paper demonstrates how Vision-Language Models can perform zero-shot perception of Operational Design Domain elements, enhancing safety for autonomous systems.
Key contributions
- Empirical study of zero-shot ODD classification and detection using four VLMs on diverse datasets.
- Ablation analysis of zero-shot optimization strategies, including a cost-performance overview.
- Provides reusable prompting templates and guidance for adapting VLMs for ODD perception.
Why it matters
Autonomous systems must operate safely within defined conditions (ODD). This paper shows VLMs can adaptively perceive these conditions without specific training data. This enables more robust and auditable perception for safety-critical applications like automated driving.
Original Abstract
Over the last few years, research on autonomous systems has matured to such a degree that the field is increasingly well-positioned to translate research into practical, stakeholder-driven use cases across well-defined domains. However, for a wide-scale practical adoption of autonomous systems, adherence to safety regulations is crucial. Many regulations are influenced by the Operational Design Domain (ODD), which defines the specific conditions in which an autonomous agent can function. This is especially relevant for Automated Driving Systems (ADS), as a dependable perception of ODD elements is essential for safe implementation and auditing. Vision-language models (VLMs) integrate visual recognition and language reasoning, functioning without task-specific training data, which makes them suitable for adaptable ODD perception. To assess whether VLMs can function as zero-shot "ODD sensors" that adapt to evolving definitions, we contribute (i) an empirical study of zero-shot ODD classification and detection using four VLMs on a custom dataset and Mapillary Vistas, along with failure analyses; (ii) an ablation of zero-shot optimization strategies with a cost-performance overview; and (iii) a suite of reusable prompting templates with guidance for adaptation. Our findings indicate that definition-anchored chain-of-thought prompting with persona decomposition performs best, while other methods may result in reduced recall. Overall, our results pave the way for transparent and effective ODD-based perception in safety-critical applications.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.