Zhengyang Tang

2 papers · Latest: April 30, 2026

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

Claw-Eval-Live is a live benchmark for LLM agents, evaluating their performance on evolving real-world workflows with verifiable execution.

2604.28139Apr 30, 2026

Natural Language Processing

Cut Your Losses! Learning to Prune Paths Early for Efficient Parallel Reasoning

STOP introduces a learnable internal path pruning method for Large Reasoning Models, significantly improving efficiency and accuracy in parallel reasoning.

2604.16029Apr 17, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.