AOI-SSL: Self-Supervised Framework for Efficient Segmentation of Wire-bonded Semiconductors In Optical Inspection
Joaquín Figueira, Rob Van Gastel, Giacomo D'Amicantonio, Zhuoran Liu, Ioan Gabriel Bucur + 2 more
TLDR
AOI-SSL is a self-supervised framework for efficient semantic segmentation of wire-bonded semiconductors, reducing labeled data needs and improving adaptation.
Key contributions
- Introduces AOI-SSL, combining self-supervised pre-training and in-context inference for semiconductor segmentation.
- Masked Autoencoders are most effective for small-data self-supervised pre-training in this industrial domain.
- Proposes patch-level retrieval for mask prediction, enabling near-instant adaptation to single device images.
- Self-supervised pre-training significantly improves segmentation quality over training from scratch or ImageNet.
Why it matters
Current semiconductor inspection models are device-specific and require costly retraining. AOI-SSL offers a training-efficient solution by minimizing labeled data needs and enabling rapid adaptation to new devices or shifts. This significantly reduces operational overhead in automated optical inspection.
Original Abstract
Segmentation models in automated optical inspection of wire-bonded semiconductors are typically device-specific and must be re-trained when new devices or distribution shifts appear. We introduce AOI-SSL, a training-efficient framework for semantic segmentation of wire-bonded semiconductors by combining small-domain self-supervised pre-training of vision transformers with in-context inference that minimizes the need of labeled examples. We pre-train SOTA self-supervised algorithms in a small industrial inspection dataset and find that Masked Autoencoders are the most effective in this small-data setting, improving downstream segmentation while reducing the labeled fine-tuning effort. We further introduce in-context, patch-level retrieval methods that predict masks directly from dense encoder embeddings with negligible additional training. We show that, in this setting, simple similarity-based retrieval performs on par with more complex attention-based aggregation used currently in the literature. Furthermore, our experiments demonstrate that self-supervised pre-training significantly improves segmentation quality compared to training from scratch and to ImageNet pre-trained backbones under a fixed fine-tuning computational budget. Finally, the results reveal that retrieval based segmentation outperforms fine-tuning when targeting single device images, allowing for near-instant adaptation to difficult samples.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.