Jiaqi Wang
7 papers ยท Latest:
WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation
WildClawBench introduces a new benchmark for evaluating long-horizon, real-world agents using native runtimes and real tools.
Near-Future Policy Optimization
NPO and AutoNPO enhance Reinforcement Learning with Verifiable Rewards (RLVR) by leveraging near-future policy checkpoints for improved off-policy learning.
On-chain Peak Shaving
This paper studies "on-chain peak shaving," scheduling Ethereum transactions to off-peak hours to reduce gas fees, finding varied firm strategies and cost savings.
UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models
UDM-GRPO integrates Uniform Discrete Diffusion Models with RL using novel insights for stable and efficient policy optimization, achieving SOTA results.
Adaptive Spiking Neurons for Vision and Language Modeling
Introduces Adaptive Spiking Neurons (ASN/NASN) for efficient, high-performance SNNs across diverse vision and language tasks.
Winner-Take-All Spiking Transformer for Language Modeling
This paper introduces Winner-Take-All Spiking Transformers (WTA-ST) for energy-efficient language modeling, replacing costly softmax with novel spike-driven attention.
Self-Distilled RLVR
RLSD combines RLVR with self-distillation to provide fine-grained updates and reliable directions, improving LLM training stability and convergence.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.