Maliheh Izadi
2 papers ยท Latest:
Software Engineering
Evaluating Non-English Developer Support in Machine Learning for Software Engineering
Non-English developer support in ML for software engineering is severely lacking, with generation and evaluation methods failing for multilingual code.
2605.05902
Artificial IntelligenceDo Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture The Flag Challenges
DeepRed is a new benchmark for evaluating LLM agents in realistic Capture The Flag (CTF) challenges, revealing current agents' limited capabilities.
2604.19354
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.