Sten Sootla

3 papers · Latest: May 5, 2026

ProgramBench: Can Language Models Rebuild Programs From Scratch?

ProgramBench evaluates language models' ability to holistically rebuild software from scratch, revealing current LMs struggle with architectural decisions.

2605.03546May 5, 2026

Artificial Intelligence

The Llama 3 Herd of Models

Llama 3 is a new family of large multilingual foundation models excelling in language, coding, reasoning, and multimodal tasks, rivaling GPT-4 in quality and offering extensive public releases.

2407.21783Jul 31, 2024

Natural Language Processing

Code Llama: Open Foundation Models for Code

Code Llama is a new family of open-source large language models specialized for coding tasks, achieving state-of-the-art results on multiple benchmarks with support for long contexts and code infilling.

2308.12950Aug 24, 2023

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.