OPT: Open Pre-trained Transformer Language Models
Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen + 14 more
TLDR
OPT is a suite of openly released large-scale transformer language models comparable to GPT-3 but developed with significantly lower environmental impact.
Key contributions
- Introduced OPT models ranging from 125 million to 175 billion parameters, fully open for research access.
- Demonstrated OPT-175B matches GPT-3 performance while using only one-seventh of the carbon footprint during training.
- Released detailed infrastructure logs and code to facilitate reproducibility and experimentation.
Why it matters
This paper matters because it democratizes access to state-of-the-art large language models by providing openly available weights and training details, enabling researchers to study, improve, and build upon these models without prohibitive computational or environmental costs. This transparency promotes responsible AI development and fosters innovation in natural language processing.
Original Abstract
Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning. Given their computational cost, these models are difficult to replicate without significant capital. For the few that are available through APIs, no access is granted to the full model weights, making them difficult to study. We present Open Pre-trained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M to 175B parameters, which we aim to fully and responsibly share with interested researchers. We show that OPT-175B is comparable to GPT-3, while requiring only 1/7th the carbon footprint to develop. We are also releasing our logbook detailing the infrastructure challenges we faced, along with code for experimenting with all of the released models.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.