Kechi Zhang

2 papers · Latest: April 24, 2026

RealBench: A Repo-Level Code Generation Benchmark Aligned with Real-World Software Development Practices

RealBench is a new benchmark for repo-level code generation, using structured designs (UML) to better align LLM evaluation with real-world software development.

2604.22659Apr 24, 2026

Natural Language Processing

Evaluating the Formal Reasoning Capabilities of Large Language Models through Chomsky Hierarchy

ChomskyBench evaluates LLM formal reasoning across the Chomsky Hierarchy, revealing performance stratification and severe efficiency barriers for complex tasks.

2604.02709Apr 3, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.