Zero-Shot Learning Through Cross-Modal Transfer

January 16, 20131301.3666

Richard Socher, Milind Ganjoo, Hamsa Sridhar, Osbert Bastani, Christopher D. Manning + 1 more

cs.CVcs.LG

TLDR

This paper presents a zero-shot learning model that recognizes both seen and unseen object classes by leveraging semantic information from large text corpora without requiring labeled images or manual semantic features.

Key contributions

Introduces a zero-shot learning framework using distributional semantics from unsupervised text to represent unseen classes.
Achieves state-of-the-art performance on seen classes with abundant training data while maintaining reasonable accuracy on unseen classes.
Employs outlier detection in semantic space combined with two separate recognition models, eliminating the need for manually defined semantic features.

Why it matters

This paper matters because it advances zero-shot learning by enabling models to recognize new object categories without any labeled images, relying solely on textual semantic knowledge. This approach broadens the applicability of image recognition systems to vast numbers of categories without costly data collection or manual feature engineering, pushing the boundaries of scalable and flexible AI vision systems.

Original Abstract

This work introduces a model that can recognize objects in images even if no training data is available for the objects. The only necessary knowledge about the unseen categories comes from unsupervised large text corpora. In our zero-shot framework distributional information in language can be seen as spanning a semantic basis for understanding what objects look like. Most previous zero-shot learning models can only differentiate between unseen classes. In contrast, our model can both obtain state of the art performance on classes that have thousands of training images and obtain reasonable performance on unseen classes. This is achieved by first using outlier detection in the semantic space and then two separate recognition models. Furthermore, our model does not require any manually defined semantic features for either words or images.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers