ArXiv TLDR

The Llama 3 Herd of Models

🐦 Tweet
2407.21783

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian + 556 more

cs.AIcs.CLcs.CV

TLDR

Llama 3 is a new family of large multilingual foundation models excelling in language, coding, reasoning, and multimodal tasks, rivaling GPT-4 in quality and offering extensive public releases.

Key contributions

  • Introduces Llama 3, a herd of large-scale Transformer models up to 405B parameters with 128K token context windows.
  • Demonstrates comparable performance to GPT-4 across diverse language, coding, and reasoning benchmarks.
  • Explores compositional integration of image, video, and speech capabilities, achieving competitive multimodal recognition results.

Why it matters

This paper matters because it advances foundation model capabilities by delivering a highly scalable, versatile, and publicly accessible language model family that supports multilingual and multimodal tasks. By matching state-of-the-art performance and expanding into new modalities with a compositional approach, Llama 3 pushes the boundaries of AI applicability and safety, fostering broader research and deployment opportunities.

Original Abstract

Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.