LLaMA: Open and Efficient Foundation Language Models

February 27, 20232302.13971

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux + 9 more

cs.CL

TLDR

LLaMA presents a suite of open, efficient language models that achieve state-of-the-art performance using only publicly available data.

Key contributions

Developed foundation language models ranging from 7B to 65B parameters trained on trillions of tokens.
Demonstrated that LLaMA-13B outperforms GPT-3 (175B) on most benchmarks using only public datasets.
Released all models openly to the research community, promoting transparency and accessibility.

Why it matters

This paper is significant because it challenges the notion that cutting-edge language models require proprietary data and massive parameter counts, showing that competitive performance can be achieved with open datasets and more efficient training. By releasing these models publicly, LLaMA democratizes access to powerful language models, enabling broader research and innovation in natural language processing.

Original Abstract

We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers