Jiaqi Wang

7 papers · Latest: May 11, 2026

WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation

WildClawBench introduces a new benchmark for evaluating long-horizon, real-world agents using native runtimes and real tools.

2605.10912May 11, 2026

Machine Learning

Near-Future Policy Optimization

NPO and AutoNPO enhance Reinforcement Learning with Verifiable Rewards (RLVR) by leveraging near-future policy checkpoints for improved off-policy learning.

2604.20733Apr 22, 2026

Econometrics

On-chain Peak Shaving

This paper studies "on-chain peak shaving," scheduling Ethereum transactions to off-peak hours to reduce gas fees, finding varied firm strategies and cost savings.

2604.19956Apr 21, 2026

Computer Vision

UDM-GRPO: Stable and Efficient Group Relative Policy Optimization for Uniform Discrete Diffusion Models

UDM-GRPO integrates Uniform Discrete Diffusion Models with RL using novel insights for stable and efficient policy optimization, achieving SOTA results.

2604.18518Apr 20, 2026

Neural & Evolutionary Computing

Adaptive Spiking Neurons for Vision and Language Modeling

Introduces Adaptive Spiking Neurons (ASN/NASN) for efficient, high-performance SNNs across diverse vision and language tasks.

2604.12365Apr 14, 2026

Neural & Evolutionary Computing

Winner-Take-All Spiking Transformer for Language Modeling

This paper introduces Winner-Take-All Spiking Transformers (WTA-ST) for energy-efficient language modeling, replacing costly softmax with novel spike-driven attention.

2604.11321Apr 13, 2026

Machine Learning

Self-Distilled RLVR

RLSD combines RLVR with self-distillation to provide fine-grained updates and reliable directions, improving LLM training stability and convergence.

2604.03128Apr 3, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.