Yige Xu

2 papers · Latest: April 21, 2026

Unveiling Fine-Grained Visual Traces: Evaluating Multimodal Interleaved Reasoning Chains in Multimodal STEM Tasks

Introduces StepSTEM, a new benchmark and evaluation framework for fine-grained, cross-modal STEM reasoning in MLLMs, revealing current models struggle.

2604.19697Apr 21, 2026

Computer Vision

Do Vision-Language Models Truly Perform Vision Reasoning? A Rigorous Study of the Modality Gap

Current VLMs primarily reason in textual space, showing a performance gap in vision-grounded reasoning, which CrossMath benchmark and fine-tuning address.

2604.16256Apr 17, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.