Maliheh Izadi

2 papers · Latest: May 7, 2026

Evaluating Non-English Developer Support in Machine Learning for Software Engineering

Non-English developer support in ML for software engineering is severely lacking, with generation and evaluation methods failing for multilingual code.

2605.05902May 7, 2026

Artificial Intelligence

Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture The Flag Challenges

DeepRed is a new benchmark for evaluating LLM agents in realistic Capture The Flag (CTF) challenges, revealing current agents' limited capabilities.

2604.19354Apr 21, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.