Ming Hu
2 papers ยท Latest:
Software Engineering
Assessing the Impact of Requirement Ambiguity on LLM-based Function-Level Code Generation
This paper introduces Orchid, a new benchmark with ambiguous requirements, revealing that ambiguity significantly degrades LLM code generation performance.
2604.21505
Computer VisionMedProbeBench: Systematic Benchmarking at Deep Evidence Integration for Expert-level Medical Guideline
MedProbeBench is a new benchmark evaluating LLMs' deep evidence integration for generating expert-level medical guidelines, revealing current limitations.
2604.18418
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.