LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux + 9 more
TLDR
LLaMA presents a suite of open, efficient language models that achieve state-of-the-art performance using only publicly available data.
Key contributions
- Developed foundation language models ranging from 7B to 65B parameters trained on trillions of tokens.
- Demonstrated that LLaMA-13B outperforms GPT-3 (175B) on most benchmarks using only public datasets.
- Released all models openly to the research community, promoting transparency and accessibility.
Why it matters
This paper is significant because it challenges the notion that cutting-edge language models require proprietary data and massive parameter counts, showing that competitive performance can be achieved with open datasets and more efficient training. By releasing these models publicly, LLaMA democratizes access to powerful language models, enabling broader research and innovation in natural language processing.
Original Abstract
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.