Zero-Shot Learning Through Cross-Modal Transfer
Richard Socher, Milind Ganjoo, Hamsa Sridhar, Osbert Bastani, Christopher D. Manning + 1 more
TLDR
This paper presents a zero-shot learning model that recognizes both seen and unseen object classes by leveraging semantic information from large text corpora without requiring labeled images or manual semantic features.
Key contributions
- Introduces a zero-shot learning framework using distributional semantics from unsupervised text to represent unseen classes.
- Achieves state-of-the-art performance on seen classes with abundant training data while maintaining reasonable accuracy on unseen classes.
- Employs outlier detection in semantic space combined with two separate recognition models, eliminating the need for manually defined semantic features.
Why it matters
This paper matters because it advances zero-shot learning by enabling models to recognize new object categories without any labeled images, relying solely on textual semantic knowledge. This approach broadens the applicability of image recognition systems to vast numbers of categories without costly data collection or manual feature engineering, pushing the boundaries of scalable and flexible AI vision systems.
Original Abstract
This work introduces a model that can recognize objects in images even if no training data is available for the objects. The only necessary knowledge about the unseen categories comes from unsupervised large text corpora. In our zero-shot framework distributional information in language can be seen as spanning a semantic basis for understanding what objects look like. Most previous zero-shot learning models can only differentiate between unseen classes. In contrast, our model can both obtain state of the art performance on classes that have thousands of training images and obtain reasonable performance on unseen classes. This is achieved by first using outlier detection in the semantic space and then two separate recognition models. Furthermore, our model does not require any manually defined semantic features for either words or images.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.