DeepSight: Long-Horizon World Modeling via Latent States Prediction for End-to-End Autonomous Driving
Lingjun Zhang, Changjie Wu, Linzhe Shi, Jiangyang Li, Jiaxin Liu + 4 more
TLDR
DeepSight improves end-to-end autonomous driving with a world model predicting long-horizon latent states and adaptive text reasoning.
Key contributions
- Proposes DeepSight, a driving world model for end-to-end autonomous driving.
- Performs parallel prediction of latent semantic features in BEV space for long-horizon future world states.
- Introduces an efficient, adaptive text reasoning mechanism using social knowledge for long-tail scenarios.
- Achieves state-of-the-art (SOTA) results on the closed-loop Bench2drive benchmark.
Why it matters
Current VLM-based autonomous driving systems lack tailored reasoning for driving scenarios. DeepSight addresses this by enabling long-horizon world modeling and adaptive text reasoning. This improves performance in challenging situations, pushing the boundaries of autonomous driving safety and reliability.
Original Abstract
End-to-end autonomous driving systems are increasingly integrating Vision-Language Model (VLM) architectures, incorporating text reasoning or visual reasoning to enhance the robustness and accuracy of driving decisions. However, the reasoning mechanisms employed in most methods are direct adaptations from general domains, lacking in-depth exploration tailored to autonomous driving scenarios, particularly within visual reasoning modules. In this paper, we propose a driving world model that performs parallel prediction of latent semantic features for consecutive future frames in the bird's-eye-view (BEV) space, thereby enabling long-horizon modeling of future world states. We also introduce an efficient and adaptive text reasoning mechanism that utilizes additional social knowledge and reasoning capabilities to further improve driving performance in challenging long-tail scenarios. We present a novel, efficient, and effective approach that achieves state-of-the-art (SOTA) results on the closed-loop Bench2drive benchmark. Codes are available at: https://github.com/hotdogcheesewhite/DeepSight.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.