ArXiv TLDR

C-CoT: Counterfactual Chain-of-Thought with Vision-Language Models for Safe Autonomous Driving

🐦 Tweet
2605.10744

Kefei Tian, Yuansheng Lian, Kai Yang, Xiangdong Chen, Shen Li

cs.CVcs.RO

TLDR

C-CoT uses VLMs and counterfactual chain-of-thought to improve safe autonomous driving decisions, especially in complex, high-risk scenarios.

Key contributions

  • Proposes C-CoT, a 5-stage VLM framework for safe autonomous driving decision-making.
  • Introduces a meta-action evaluation tree for explicit counterfactual risk reasoning.
  • Significantly reduces collision rates (3.52%) and improves risk prediction recall (81.9%).
  • Enhances robustness in rare and out-of-distribution driving scenarios by establishing causal links.

Why it matters

Autonomous driving faces challenges in complex, high-risk scenarios due to VLM's lack of causal reasoning. C-CoT addresses this by integrating counterfactual reasoning and a meta-action tree, establishing causal links between actions and safety outcomes. This improves robustness and decision-making in critical situations.

Original Abstract

Safety-critical planning in complex environments, particularly at urban intersections, remains a fundamental challenge for autonomous driving. Existing methods, whether rule-based or data-driven, frequently struggle to capture complex scene semantics, infer potential risks, and make reliable decisions in rare, high-risk situations. While vision-language models (VLMs) offer promising approaches for safe decision-making in these environments, most current approaches lack reflective and causal reasoning, thereby limiting their overall robustness. To address this, we propose a counterfactual chain-of-thought (C-CoT) framework that leverages VLMs to decompose driving decisions into five sequential stages: scene description, critical object identification, risk prediction, counterfactual risk reasoning, and final action planning. Within the counterfactual reasoning stage, we introduce a structured meta-action evaluation tree to explicitly assess the potential consequences of alternative action combinations. This self-reflective reasoning establishes causal links between action choices and safety outcomes, improving robustness in long-tail and out-of-distribution scenarios. To validate our approach, we construct the DeepAccident-CCoT dataset based on the DeepAccident benchmark and fine-tune a Qwen2.5-VL (7B) model using low-rank adaptation. Our model achieves a risk prediction recall of 81.9%, reduces the collision rate to 3.52%, and lowers L2 error to 1.98 m. Ablation studies further confirm the critical role of counterfactual reasoning and the meta-action evaluation tree in enhancing safety and interpretability.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.