Kaituo Feng
4 papers ยท Latest:
Computer Vision
From Web to Pixels: Bringing Agentic Search into Visual Perception
This paper introduces WebEye, a benchmark, and Pixel-Searcher, a model, for visual perception tasks requiring external knowledge and agentic search.
2605.12497
Computer VisionFlow-OPD: On-Policy Distillation for Flow Matching Models
Flow-OPD introduces an on-policy distillation framework for Flow Matching text-to-image models, resolving multi-task alignment issues.
2605.08063
Computer VisionOpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents
OpenSearch-VL provides an open-source recipe for training frontier multimodal deep search agents, achieving state-of-the-art performance.
2605.05185
Software EngineeringOpenGame: Open Agentic Coding for Games
OpenGame is an open-source agentic framework using a specialized LLM and skills to generate fully playable web games, achieving SOTA.
2604.18394
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.