Kechi Zhang
2 papers ยท Latest:
Software Engineering
RealBench: A Repo-Level Code Generation Benchmark Aligned with Real-World Software Development Practices
RealBench is a new benchmark for repo-level code generation, using structured designs (UML) to better align LLM evaluation with real-world software development.
2604.22659
Natural Language ProcessingEvaluating the Formal Reasoning Capabilities of Large Language Models through Chomsky Hierarchy
ChomskyBench evaluates LLM formal reasoning across the Chomsky Hierarchy, revealing performance stratification and severe efficiency barriers for complex tasks.
2604.02709
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.