Jiaheng Liu
2 papers ยท Latest:
Software Engineering
WebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models
WebCompass is a new multimodal benchmark for evaluating large language models' end-to-end web coding capabilities across generation, editing, and repair tasks.
2604.18224
Software EngineeringCodeTracer: Towards Traceable Agent States
CodeTracer helps debug complex code agents by tracing full state transitions and localizing hidden error chains, improving reliability.
2604.11641
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.