BEM: Training-Free Background Embedding Memory for False-Positive Suppression in Real-Time Fixed-Background Camera
Junwoo Park, Jangho Lee, Sunho Lim
TLDR
BEM is a training-free module that uses background memory to suppress false positives in real-time fixed-camera object detection, improving precision.
Key contributions
- Introduces BEM, a training-free, lightweight module for fixed-camera object detection.
- Suppresses false positives by using background embedding memory and re-scoring detection logits.
- Maintains high recall and real-time performance across YOLO and RT-DETR families.
- Leverages background-frame cosine similarity as a training-free control signal.
Why it matters
Pretrained detectors often fail in real-world fixed-camera scenarios due to false positives. BEM provides a practical, training-free solution to significantly reduce these errors, enhancing reliability for critical applications like surveillance and traffic monitoring without requiring costly retraining or labeled data.
Original Abstract
Pretrained detectors perform well on benchmarks but often suffer performance degradation in real-world deployments due to distribution gaps between training data and target environments. COCO-like benchmarks emphasize category diversity rather than instance density, causing detectors trained under per-class sparsity to struggle in dense, single- or few-class scenes such as surveillance and traffic monitoring. In fixed-camera environments, the quasi-static background provides a stable, label-free prior that can be exploited at inference to suppress spurious detections. To address the issue, we propose Background Embedding Memory (BEM), a lightweight, training-free, weight-frozen module that can be attached to pretrained detectors during inference. BEM estimates clean background embeddings, maintains a prototype memory, and re-scores detection logits with an inverse-similarity, rank-weighted penalty, effectively reducing false positives while maintaining recall. Empirically, background-frame cosine similarity correlates negatively with object count and positively with Precision-Confidence AUC (P-AUC), motivating its use as a training-free control signal. Across YOLO and RT-DETR families on LLVIP and simulated surveillance streams, BEM consistently reduces false positives while preserving real-time performance. Our code is available at https://github.com/Leo-Park1214/Background-Embedding-Memory.git
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.