Qianqian Xie
2 papers ยท Latest:
Computer Vision
SiMing-Bench: Evaluating Procedural Correctness from Continuous Interactions in Clinical Skill Videos
SiMing-Bench evaluates MLLMs' ability to judge procedural correctness from continuous interactions in clinical skill videos, revealing current models' limitations.
2604.09037
Computer VisionAppear2Meaning: A Cross-Cultural Benchmark for Structured Cultural Metadata Inference from Images
Introduces Appear2Meaning, a cross-cultural benchmark to evaluate VLMs' ability to infer structured cultural metadata from images, revealing current limitations.
2604.07338
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.