CollideNet: Hierarchical Multi-scale Video Representation Learning with Disentanglement for Time-To-Collision Forecasting

April 17, 20262604.16240

Nishq Poorav Desai, Ali Etemad, Michael Greenspan

cs.CV

TLDR

CollideNet is a hierarchical multi-scale transformer for Time-to-Collision forecasting, disentangling video components to achieve state-of-the-art performance.

Key contributions

Introduces CollideNet, a novel spatiotemporal hierarchical transformer for Time-to-Collision (TTC) forecasting.
Employs a spatial stream to aggregate multi-resolution information from video frames.
Utilizes a temporal stream that disentangles non-stationarity, trend, and seasonality components.
Achieves new state-of-the-art performance on three public TTC forecasting datasets.

Why it matters

Accurate Time-to-Collision (TTC) forecasting is crucial for collision prevention in autonomous systems. CollideNet significantly advances this field by effectively processing multi-scale video data and disentangling complex temporal patterns, leading to more reliable predictions. This breakthrough could enhance safety in various applications.

Original Abstract

Time-to-Collision (TTC) forecasting is a critical task in collision prevention, requiring precise temporal prediction and comprehending both local and global patterns encapsulated in a video, both spatially and temporally. To address the multi-scale nature of video, we introduce a novel spatiotemporal hierarchical transformer-based architecture called CollideNet, specifically catered for effective TTC forecasting. In the spatial stream, CollideNet aggregates information for each video frame simultaneously at multiple resolutions. In the temporal stream, along with multi-scale feature encoding, CollideNet also disentangles the non-stationarity, trend, and seasonality components. Our method achieves state-of-the-art performance in comparison to prior works on three commonly used public datasets, setting a new state-of-the-art by a considerable margin. We conduct cross-dataset evaluations to analyze the generalization capabilities of our method, and visualize the effects of disentanglement of the trend and seasonality components of the video data. We release our code at https://github.com/DeSinister/CollideNet/.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers