The Llama 3 Herd of Models

July 31, 20242407.21783

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian + 556 more

cs.AIcs.CLcs.CV

TLDR

Llama 3 is a new family of large multilingual foundation models excelling in language, coding, reasoning, and multimodal tasks, rivaling GPT-4 in quality and offering extensive public releases.

Key contributions

Introduces Llama 3, a herd of large-scale Transformer models up to 405B parameters with 128K token context windows.
Demonstrates comparable performance to GPT-4 across diverse language, coding, and reasoning benchmarks.
Explores compositional integration of image, video, and speech capabilities, achieving competitive multimodal recognition results.

Why it matters

This paper matters because it advances foundation model capabilities by delivering a highly scalable, versatile, and publicly accessible language model family that supports multilingual and multimodal tasks. By matching state-of-the-art performance and expanding into new modalities with a compositional approach, Llama 3 pushes the boundaries of AI applicability and safety, fostering broader research and deployment opportunities.

Original Abstract

Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers