William Jurayj

2 papers · Latest: May 1, 2026

Can Coding Agents Reproduce Findings in Computational Materials Science?

AutoMat benchmarks LLM coding agents' ability to reproduce computational materials science findings, revealing current agents achieve low success rates.

2605.00803May 1, 2026

Natural Language Processing

Many-Tier Instruction Hierarchy in LLM Agents

This paper introduces ManyIH, a new paradigm and benchmark (ManyIH-Bench) to help LLM agents resolve conflicts from many instruction sources.

2604.09443Apr 10, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.