John Yang

2 papers · Latest: May 5, 2026

ProgramBench: Can Language Models Rebuild Programs From Scratch?

ProgramBench evaluates language models' ability to holistically rebuild software from scratch, revealing current LMs struggle with architectural decisions.

2605.03546May 5, 2026

Artificial Intelligence

SWE-chat: Coding Agent Interactions From Real Users in the Wild

SWE-chat dataset reveals real-world coding agent usage, showing inefficiencies, security risks, and user interaction patterns in developer workflows.

2604.20779Apr 22, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.