Label-Efficient School Detection from Aerial Imagery via Weakly Supervised Pretraining and Fine-Tuning
Zakarya Elmimouni, Fares Fourati, Mohamed-Slim Alouini
TLDR
This paper proposes a weakly supervised two-stage framework for label-efficient school detection from aerial imagery, significantly reducing annotation needs.
Key contributions
- Proposes a weakly supervised framework for label-efficient school detection from aerial imagery.
- Introduces an automatic labeling pipeline using sparse locations and semantic segmentation.
- Employs a two-stage training process: pretraining on auto-labeled data, fine-tuning on minimal manual labels.
- Achieves strong detection performance using only 50 manually labeled images, significantly cutting costs.
Why it matters
This framework addresses the critical need for accurate, scalable school mapping by minimizing costly human annotations. It supports global education and connectivity initiatives, especially in low-data regions, making large-scale infrastructure planning more feasible.
Original Abstract
Accurate school detection is essential for supporting education initiatives, including infrastructure planning and expanding internet connectivity to underserved areas. However, many regions around the world face challenges due to outdated, incomplete, or unavailable official records. Manual mapping efforts, while valuable, are labor-intensive and lack scalability across large geographic areas. To address this, we propose a weakly supervised framework for school detection from aerial imagery that minimizes the need for human annotations while supporting global mapping efforts. Our method is specifically designed for low-data regimes, where manual annotations are extremely scarce. We introduce an automatic labeling pipeline that leverages sparse location points and semantic segmentation to generate infrastructure masks from which we generate bounding boxes. Using these automatically labeled images, we train our detectors on a first training stage to learn a representation of what schools look like, then using a small set of manually labeled images, we fine-tune the previously trained models on this clean dataset. This two stage training pipeline enables large-scale and strong detection in low-data setting of school infrastructure with minimal supervision. Our results demonstrate strong object detection performance, particularly in the low-data regime, where the models achieve promising results using only 50 manually labeled images, significantly reducing the need for costly annotations. This framework supports education and connectivity initiatives worldwide by providing an efficient and extensible approach to mapping schools from space. All models, training code and auto-labeled data will be publicly released to foster future research and real-world impact.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.