Zhengyang Tang
2 papers ยท Latest:
Software Engineering
Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows
Claw-Eval-Live is a live benchmark for LLM agents, evaluating their performance on evolving real-world workflows with verifiable execution.
2604.28139
Natural Language ProcessingCut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning
STOP introduces a learnable internal path pruning method for Large Reasoning Models, significantly improving efficiency and accuracy in parallel reasoning.
2604.16029
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.