Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

October 23, 20191910.10683

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang + 4 more

cs.LGcs.CLstat.ML

TLDR

This paper introduces a unified text-to-text framework for transfer learning in NLP, achieving state-of-the-art results across diverse language tasks by systematically exploring pre-training and fine-tuning strategies.

Key contributions

Proposes a unified text-to-text format that converts all NLP tasks into a single framework.
Systematically compares various pre-training objectives, architectures, data sets, and transfer methods.
Introduces the Colossal Clean Crawled Corpus (C4) and releases pre-trained models and code for community use.

Why it matters

By unifying diverse NLP tasks under a single text-to-text paradigm and rigorously analyzing transfer learning components, this work simplifies and advances the development of versatile language models. The large-scale clean dataset and open resources further empower researchers to build upon these findings, accelerating progress in natural language understanding and generation.

Original Abstract

Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts all text-based language problems into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled data sets, transfer approaches, and other factors on dozens of language understanding tasks. By combining the insights from our exploration with scale and our new ``Colossal Clean Crawled Corpus'', we achieve state-of-the-art results on many benchmarks covering summarization, question answering, text classification, and more. To facilitate future work on transfer learning for NLP, we release our data set, pre-trained models, and code.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers