Xiangyu Yue
4 papers ยท Latest:
Computer Vision
From Web to Pixels: Bringing Agentic Search into Visual Perception
This paper introduces WebEye, a benchmark, and Pixel-Searcher, a model, for visual perception tasks requiring external knowledge and agentic search.
2605.12497
Computer VisionOpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
OpenSearch-VL provides an open-source recipe for training frontier multimodal deep search agents, achieving state-of-the-art performance.
2605.05185
Software EngineeringOpenGame: Open Agentic Coding for Games
OpenGame is an open-source agentic framework using a specialized LLM and skills to generate fully playable web games, achieving SOTA.
2604.18394
Machine LearningBeyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
MODPO is a novel, RL-free method for aligning language models to multiple human preferences simultaneously, achieving stable and efficient optimization across diverse objectives.
2310.03708
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.