How to utilize failure demo data?: Effective data selection for imitation learning using distribution differences in attention mechanism
Kana Miyamoto, Kanata Suzuki, Tetsuya Ogata
TLDR
This paper proposes a method to effectively use failure demonstration data in imitation learning by learning success-failure discrepancies in attention mechanisms.
Key contributions
- Learns latent success-failure discrepancies, integrating them into an attention mechanism for imitation learning.
- Selects an appropriate latent mode during inference from initial observation to improve action stability.
- Introduces a post-training metric to quantify attention discrepancy for selecting beneficial failure data.
- Demonstrates improved task success rates and effective failure data identification in simulations.
Why it matters
This paper offers a novel way to leverage unavoidable failure data in robotic imitation learning, which is often discarded. By improving the utilization of collected demonstrations, it makes data collection pipelines more efficient and robust, reducing the need for extensive successful-only datasets.
Original Abstract
Imitation learning for robotic tasks has relied primarily on policies trained only on successful demonstrations, although failures are unavoidable during human data collection. Many existing approaches for exploiting failure data require additional data processing or iterative policy updates through autonomous rollouts, making it difficult to directly and stably utilize failure data accumulated during data collection. In this work, we propose a method that learns latent representations of success-failure discrepancies and incorporates them into the attention mechanism. During inference, an appropriate latent mode is selected from the initial observation to improve action stability. Furthermore, we introduce a post-training metric that quantifies the attention discrepancy between each failure sample and successful demonstrations to select failure data. Simulation results show that the proposed method improves task success rates when trained with failure data and that the proposed metric identifies failure samples that are beneficial for learning when combined with successful demonstrations. These results suggest that the proposed method can support more efficient use of collected demonstrations in robotic data collection pipelines.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.