Haoyang Huang
4 papers ยท Latest:
Computer Vision
OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation
OmniNFT proposes a novel diffusion RL framework to improve joint audio-video generation by addressing multi-modal challenges like gradient imbalance.
2605.12480
Software EngineeringWebCompass: Towards Multimodal Web Coding Evaluation for Code Language Models
WebCompass is a new multimodal benchmark for evaluating large language models' end-to-end web coding capabilities across generation, editing, and repair tasks.
2604.18224
Software EngineeringCodeTracer: Towards Traceable Agent States
CodeTracer helps debug complex code agents by tracing full state transitions and localizing hidden error chains, improving reliability.
2604.11641
Natural Language ProcessingOpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence
OpenSpatial is an open-source data engine and 3M-sample dataset that significantly improves spatial reasoning models, achieving SOTA performance.
2604.07296
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.