ArXiv TLDR

TriBand-BEV: Real-Time LiDAR-Only 3D Pedestrian Detection via Height-Aware BEV and High-Resolution Feature Fusion

🐦 Tweet
2605.12220

Mohammad Khoshkdahan, Alexey Vinel

cs.CVcs.AIcs.LGcs.RO

TLDR

TriBand-BEV introduces a real-time LiDAR-only 3D pedestrian detection method using a height-aware BEV encoding, outperforming prior methods on KITTI.

Key contributions

  • Introduces TriBand-BEV, a height-aware BEV encoding for 3D LiDAR point clouds.
  • Reformulates 3D detection as a 2D problem, reconstructing 3D boxes from BEV outputs.
  • Achieves real-time 3D detection for pedestrians, cars, and cyclists in a single pass.
  • Surpasses Complex-YOLO on KITTI, achieving 49 FPS with significant AP gains for pedestrians.

Why it matters

This paper offers a real-time, LiDAR-only 3D pedestrian detection solution vital for autonomous systems. Its efficient TriBand-BEV encoding and high performance on KITTI make it ready for immediate robotic deployment, significantly enhancing VRU safety.

Original Abstract

Safe autonomous agents and mobile robots need fast real time 3D perception, especially for vulnerable road users (VRUs) such as pedestrians. We introduce a new bird's eye view (BEV) encoding, which maps the full 3D LiDAR point cloud into a light-weight 2D BEV tensor with three height bands. We explicitly reformulate 3D detection as a 2D detection problem and then reconstruct 3D boxes from the BEV outputs. A single network detects cars, pedestrians, and cyclists in one pass. The backbone uses area attention at deep stages, a hierarchical bidirectional neck over P1 to P4 fuses context and detail, and the head predicts oriented boxes with distribution focal learning for side offsets and a rotated IoU loss. Training applies a small vertical re bin and a mild reflectance jitter in channel space to resist memorization. We use an interquartile range (IQR) filter to remove noisy and outlier LiDAR points during 3D reconstruction. On KITTI dataset, TriBand-BEV attains 58.7/52.6/47.2 pedestrian BEV AP(%) for easy, moderate, and hard at 49 FPS on a single consumer GPU, surpassing Complex-YOLO, with gains of +12.6%, +7.5%, and +3.1%. Qualitative scenes show stable detection under occlusion. The pipeline is compact and ready for real time robotic deployment. Our source code is publicly available on GitHub.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.