Wenzhao Zheng

3 papers · Latest: May 8, 2026

Proxy3D: Efficient 3D Representations for Vision-Language Models via Semantic Clustering and Alignment

Proxy3D introduces efficient 3D representations for Vision-Language Models by using semantic-aware clustering of scene features from video frames.

2605.08064May 8, 2026

Computer Vision

BAMI: Training-Free Bias Mitigation in GUI Grounding

BAMI is a training-free method that uses coarse-to-fine focus and candidate selection to mitigate precision and ambiguity biases in GUI grounding models.

2605.06664May 7, 2026

Computer Vision

UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection

UniGenDet is a unified generative-discriminative framework that co-evolves image generation and detection, achieving state-of-the-art performance.

2604.21904Apr 23, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.