Nanyun Peng

2 papers · Latest: May 12, 2026

LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues

LongMemEval-V2 introduces a new benchmark to evaluate long-term agent memory for acquiring environment-specific experience in web environments.

OpenVLThinkerV2 introduces Gaussian GRPO and task-level shaping to create a robust multimodal reasoning model, outperforming existing models.

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.