Jian Yang
8 papers ยท Latest:
Toward Visually Realistic Simulation: A Benchmark for Evaluating Robot Manipulation in Simulation
VISER is a new visually realistic benchmark for robot manipulation, bridging the sim-to-real gap with high-fidelity assets and strong real-world correlation.
Height-Guided Projection Reparameterization for Camera-LiDAR Occupancy
HiPR enhances 3D occupancy prediction by adaptively reparameterizing projection space using height-guided LiDAR features, achieving SOTA performance.
FinSafetyBench: Evaluating LLM Safety in Real-World Financial Scenarios
FinSafetyBench is a new bilingual red-teaming benchmark evaluating LLM safety and compliance in real-world financial scenarios, revealing vulnerabilities.
ClawGym: A Scalable Framework for Building Effective Claw Agents
ClawGym introduces a scalable framework for developing Claw-style agents, including a synthetic dataset, trained models, and an evaluation benchmark.
A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression
TACO is a self-evolving framework that efficiently compresses observational context for terminal agents, reducing token costs and improving performance.
From Imitation to Discrimination: Progressive Curriculum Learning for Robust Web Navigation
This paper introduces the Triton dataset and a progressive curriculum for robust web navigation, achieving SOTA performance and surpassing large LMs.
InCoder-32B-Thinking: Industrial Code World Model for Thinking
InCoder-32B-Thinking generates expert reasoning traces for industrial code by combining error-driven chain-of-thought with a hardware-aware world model.
Qwen Technical Report
Qwen is a versatile large language model series featuring base, chat, coding, and math-specialized models that achieve strong performance across diverse AI tasks, rivaling larger and proprietary models.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.