ArXiv TLDR

Jian Yang

8 papers ยท Latest:

Robotics

Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation

VISER is a new visually realistic benchmark for robot manipulation, bridging the sim-to-real gap with high-fidelity assets and strong real-world correlation.

2605.06311
Computer Vision

Height-Guided Projection Reparameterization for Camera-LiDAR Occupancy

HiPR enhances 3D occupancy prediction by adaptively reparameterizing projection space using height-guided LiDAR features, achieving SOTA performance.

2605.05072
Natural Language Processing

FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios

FinSafetyBench is a new bilingual red-teaming benchmark evaluating LLM safety and compliance in real-world financial scenarios, revealing vulnerabilities.

2605.00706
Natural Language Processing

ClawGym: A Scalable Framework for Building Effective Claw Agents

ClawGym introduces a scalable framework for developing Claw-style agents, including a synthetic dataset, trained models, and an evaluation benchmark.

2604.26904
Natural Language Processing

A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression

TACO is a self-evolving framework that efficiently compresses observational context for terminal agents, reducing token costs and improving performance.

2604.19572
Machine Learning

From Imitation to Discrimination: Progressive Curriculum Learning for Robust Web Navigation

This paper introduces the Triton dataset and a progressive curriculum for robust web navigation, achieving SOTA performance and surpassing large LMs.

2604.12666

InCoder-32B-Thinking: Industrial Code World Model for Thinking

InCoder-32B-Thinking generates expert reasoning traces for industrial code by combining error-driven chain-of-thought with a hardware-aware world model.

2604.03144
Natural Language Processing

Qwen Technical Report

Qwen is a versatile large language model series featuring base, chat, coding, and math-specialized models that achieve strong performance across diverse AI tasks, rivaling larger and proprietary models.

2309.16609

๐Ÿ“ฌ Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week โ€” summarized, scored, and delivered to your inbox every Monday.