ArXiv TLDR

Driver-WM: A Driver-Centric Traffic-Conditioned Latent World Model for In-Cabin Dynamics Rollout

🐦 Tweet
2605.05092

Haozhuang Chi, Daosheng Qiu, Hao Su, Haochen Liu, Zirui Li + 2 more

cs.ROcs.AIcs.CV

TLDR

Driver-WM is a novel latent world model forecasting in-cabin driver dynamics, causally conditioned on external traffic, for safer L2/L3 autonomous driving.

Key contributions

  • Introduces Driver-WM, a latent world model for forecasting in-cabin driver dynamics.
  • Causally conditions driver behavior on external traffic context using a dual-stream architecture.
  • Unifies physical kinematics forecasting with behavioral and emotional semantic recognition.
  • Achieves robust long-horizon forecasting for high-motion maneuvers and improves semantic alignment.

Why it matters

Current driving models overlook multi-step in-cabin driver dynamics, which is crucial for L2/L3 shared control. Driver-WM addresses this by predicting driver reactions, enhancing safety and enabling proactive system responses during critical transitions. This improves human-in-the-loop integration for autonomous vehicles.

Original Abstract

Safe L2/L3 driving automation requires anticipating human-in-the-loop reactions during shared-control transitions. While most driving world models forecast the external environment, in-cabin intelligence remains strictly recognition-oriented and lacks multi-step rollout capabilities for driver dynamics. We introduce Driver-WM, a driver-centric latent world model that rolls out in-cabin dynamics causally conditioned on out-cabin traffic context. This formulation unifies physical kinematics forecasting with auxiliary behavioral and emotional semantic recognition. Operating in a compact latent space constructed from frozen vision-language features, Driver-WM adopts a dual-stream architecture to separately encode external traffic and internal driver states. These streams are directionally coupled via a gated causal injection mechanism, which uses a learned vector gate to modulate external contextual perturbations while strictly enforcing temporal causality. Evaluations on a multi-task assistive driving benchmark demonstrate that Driver-WM yields robust long-horizon geometric forecasting for reactive high-motion maneuvers and improves semantic alignment for both driver and traffic states. Finally, the explicit external-to-internal conditioning allows for controlled test-time interventions to systematically analyze mechanism responses.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.