Benjamin Van Durme
3 papers ยท Latest:
Software Engineering
Can Coding Agents Reproduce Findings in Computational Materials Science?
AutoMat benchmarks LLM coding agents' ability to reproduce computational materials science findings, revealing current agents achieve low success rates.
2605.00803
Information RetrievalA Replicability Study of XTR
This study replicates XTR, finding its training improves efficient retrieval engines like PLAID and WARP, despite no overall effectiveness gain over ColBERT.
2605.00646
Natural Language ProcessingMany-Tier Instruction Hierarchy in LLM Agents
This paper introduces ManyIH, a new paradigm and benchmark (ManyIH-Bench) to help LLM agents resolve conflicts from many instruction sources.
2604.09443
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.