William Walden

2 papers · Latest: May 6, 2026

DoGMaTiQ: Automated Generation of Question-and-Answer Nuggets for Report Evaluation

DoGMaTiQ automates the generation of high-quality, QA-based "nuggets" for evaluating RAG reports, showing strong correlation with human judgments.

2605.04458May 6, 2026

Software Engineering

Can Coding Agents Reproduce Findings in Computational Materials Science?

AutoMat benchmarks LLM coding agents' ability to reproduce computational materials science findings, revealing current agents achieve low success rates.

2605.00803May 1, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.