John Yang
2 papers ยท Latest:
Software Engineering
ProgramBench: Can Language Models Rebuild Programs From Scratch?
ProgramBench evaluates language models' ability to holistically rebuild software from scratch, revealing current LMs struggle with architectural decisions.
2605.03546
Artificial IntelligenceSWE-chat: Coding Agent Interactions From Real Users in the Wild
SWE-chat dataset reveals real-world coding agent usage, showing inefficiencies, security risks, and user interaction patterns in developer workflows.
2604.20779
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.