Yige Xu
2 papers ยท Latest:
Computer Vision
Unveiling Fine-Grained Visual Traces: Evaluating Multimodal Interleaved Reasoning Chains in Multimodal STEM Tasks
Introduces StepSTEM, a new benchmark and evaluation framework for fine-grained, cross-modal STEM reasoning in MLLMs, revealing current models struggle.
2604.19697
Computer VisionDo Vision-Language Models Truly Perform Vision Reasoning? A Rigorous Study of the Modality Gap
Current VLMs primarily reason in textual space, showing a performance gap in vision-grounded reasoning, which CrossMath benchmark and fine-tuning address.
2604.16256
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.