Ming Hu

2 papers · Latest: April 23, 2026

Assessing the Impact of Requirement Ambiguity on LLM-based Function-Level Code Generation

This paper introduces Orchid, a new benchmark with ambiguous requirements, revealing that ambiguity significantly degrades LLM code generation performance.

2604.21505Apr 23, 2026

Computer Vision

MedProbeBench: Systematic Benchmarking at Deep Evidence Integration for Expert-level Medical Guideline

MedProbeBench is a new benchmark evaluating LLMs' deep evidence integration for generating expert-level medical guidelines, revealing current limitations.

2604.18418Apr 20, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.