Failure Identification in Imitation Learning Via Statistical and Semantic Filtering

April 15, 20262604.13788

Quentin Rolland, Fabrice Mayran de Chamisso, Jean-Baptiste Mouret

cs.ROcs.CV

TLDR

FIDeL identifies robot failures by combining anomaly detection with semantic filtering using a VLM, outperforming baselines.

Key contributions

Introduces FIDeL, a policy-independent module for robust failure identification in imitation learning.
Leverages optimal transport matching and conformal prediction for anomaly scoring and thresholding.
Employs a Vision-Language Model (VLM) for semantic filtering to distinguish failures from benign anomalies.
Presents BotFails, a new multimodal dataset for real-world robotic failure detection.

Why it matters

Imitation learning policies struggle with real-world brittleness due to unexpected events. This paper addresses a critical gap by not just detecting anomalies but specifically identifying true failures. By improving failure detection, FIDeL enhances the safety and reliability of robotic systems in complex environments.

Original Abstract

Imitation learning (IL) policies in robotics deliver strong performance in controlled settings but remain brittle in real-world deployments: rare events such as hardware faults, defective parts, unexpected human actions, or any state that lies outside the training distribution can lead to failed executions. Vision-based Anomaly Detection (AD) methods emerged as an appropriate solution to detect these anomalous failure states but do not distinguish failures from benign deviations. We introduce FIDeL (Failure Identification in Demonstration Learning), a policy-independent failure detection module. Leveraging recent AD methods, FIDeL builds a compact representation of nominal demonstrations and aligns incoming observations via optimal transport matching to produce anomaly scores and heatmaps. Spatio-temporal thresholds are derived with an extension of conformal prediction, and a Vision-Language Model (VLM) performs semantic filtering to discriminate benign anomalies from genuine failures. We also introduce BotFails, a multimodal dataset of real-world tasks for failure detection in robotics. FIDeL consistently outperforms state-of-the-art baselines, yielding +5.30% percent AUROC in anomaly detection and +17.38% percent failure-detection accuracy on BotFails compared to existing methods.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers